Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 17th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on June 10th, 2025.
  • The first revision was submitted on August 22nd, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on September 30th, 2025.

Version 0.2 (accepted)

· Sep 30, 2025 · Academic Editor

Accept

Dear Author,

Your paper has been revised. It has been accepted for publication in PeerJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

As a result of the revisions, the Methods section has been reorganized according to a logical workflow, and critical details such as genome version, normalization method, and class distributions have been added to the dataset description. The Discussion has been strengthened with literature comparisons, explicit acknowledgment of limitations, clinical applicability, and SHAP-based visualizations, thereby clarifying both the biological and clinical context. Language use has been refined to meet academic standards, and the reference list has been updated with relevant works and corrected for formatting issues. The claim of novelty has been reinforced through comparisons with state-of-the-art approaches such as TabNet and AutoGluon, as well as the added contribution of SHAP-based interpretability.

Experimental design

The authors revised the methodology section following a logical sequence, presenting the research process in a systematic and transparent manner.

Data Description: The RNA-seq dataset, sample types, and key characteristics are clearly defined, providing context for subsequent analyses.

Preprocessing: Steps such as Spearman-based outlier removal, ANOVA filtering, normalization, and label encoding are presented in order for clarity.

Base Models: Random Forest, Gradient Boosting, XGBoost, and SVM were trained with GridSearchCV and 5-fold stratified cross-validation, with optimization methods specified.

Transformer Meta-Learner: Probability vectors from base models were input into a Transformer-based meta-learner with multi-head attention, dropout, and Softmax layers.

Evaluation: Performance metrics and a confusion matrix were reported in a dedicated section.

Validity of the findings

The results of the study are supported by comprehensive comparisons with strong methods in the literature as well as additional statistical analyses. Maintaining an independent train-test split and reporting multiple performance indicators further reinforces the reliability of the findings. However, the use of limited feature selection techniques remains a point of caution regarding the generalizability of the results to other datasets. As a minor suggestion, I recommend slightly expanding the reference list to include additional relevant and recent studies.

Version 0.1 (original submission)

· Jun 10, 2025 · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

This manuscript presents a hybrid classification model that employs a stacking ensemble framework combining four base learners—Random Forest, Gradient Boosting, XGBoost, and SVM—whose probability outputs are concatenated and fed into a Transformer-based meta-learner. The Transformer leverages self-attention mechanisms to model the relationships among the base models’ predictions and generate the final cancer type classification.
While the proposed approach is promising and achieves high performance on RNA-Seq data from TCGA, the manuscript requires substantial revisions to improve its clarity, methodology presentation, feature selection justification, and discussion depth. Detailed points for revision have been outlined in the comments below.

Experimental design

no comment

Validity of the findings

no comment

Additional comments

General Evaluation: This study proposes a timely and technically sound approach for cancer classification based on RNA-seq data by integrating traditional machine learning models with a Transformer-based meta-learner. The method achieves high accuracy, the architecture is well explained, and the authors provide open-source code to support reproducibility. However, there are several critical areas where the manuscript requires improvement to enhance its scientific quality and clarity.

Below are the main concerns and recommendations:
1. Novelty Claims Are Not Sufficiently Justified
The authors repeatedly emphasize the novelty of their approach; however, they do not clearly distinguish their method from existing Transformer + stacking combinations in the literature. To support this claim, the manuscript should include a comparative discussion with similar methods, such as AutoGluon, TabNet, or other recent ensemble models using deep learning components.
2. Methodology Section Is Structurally Disorganized
The methodology section presents various processing steps in a somewhat scattered order, making it difficult for the reader to follow the workflow clearly. The authors are encouraged to restructure the section using the following logical progression:
• Data Description and Source
• Data Preprocessing (Spearman-based outlier removal, ANOVA filtering, normalization, label encoding)
• Base Model Training (RF, GBT, XGBoost, SVM; GridSearchCV, stratified 5-fold CV, parallel computation)
• Transformer Meta-Learner (probability vectors as input, attention layers, dropout, Softmax)
• Performance Evaluation (accuracy, AUC, F1, ROC curves, confusion matrix)
3. Feature Selection Is Inadequate
The feature selection process, based on ANOVA and correlation filtering, results in the removal of only 217 genes (~1.1% of the original feature set). This level of reduction is insufficient for RNA-seq data, which is inherently high-dimensional. More robust feature selection methods such as LASSO, RFE, Boruta, or mutual information should be considered to improve interpretability and reduce overfitting risk.
4. Data Description and Variable Structure Are Incomplete
Several critical details regarding the dataset are missing. These include the type of gene identifiers used, the genome annotation version (e.g., GRCh37 vs. GRCh38), and the normalization technique (e.g., TPM, RPKM, log transformation). Additionally, the target label “Cancer_tumor” is not clearly defined—there is no information about the cancer classes or the sample size per class. These omissions hinder reproducibility and make it difficult to assess class imbalance.
5. Discussion Section Is Underdeveloped
The discussion section is weaker compared to the methodological depth of the study. The following issues should be addressed:
• Lack of comparison with relevant prior work in the literature
• Superficial treatment of the study’s limitations (e.g., reliance on a single dataset, absence of external validation)
• Vague statements regarding clinical applicability
• No evidence or visualization to support claims of explainability using the attention mechanism
The potential of Transformer-based attention for biological interpretability is significant and should be further explored, preferably with visual examples or case-specific gene importance analysis.
6. Writing and Language Quality
Although the manuscript is generally understandable, there are numerous stylistic and grammatical issues that reduce clarity. Phrases like "is able to" and "the biggest innovation" are overused and not well-suited for academic writing. The authors are advised to revise the manuscript with the assistance of a native English speaker or a professional language editing service.
7. References Require Major Revision
The reference list contains several issues:
• Key related works—particularly recent Transformer-based models in biomedical classification—are missing and should be added. Notable omissions include TabTransformer, AutoGluon, and relevant deep ensemble frameworks.
• Some citations are not directly relevant or are from low-impact sources.
• There are multiple formatting inconsistencies, such as inconsistent author name presentation, journal title abbreviations, and missing DOIs. The reference list should be thoroughly revised to comply with the journal’s citation style and to better support the manuscript’s claims.

Reviewer 2 ·

Basic reporting

Interpretability-
Page 9, Lines 141-146: Authors state that "we can analyze the prediction results and important features of each base model". Interpretability typically refers to a mapping between features and the outcome, not between prediction results and outcome in case of ensemble learning. Can the authors explain how understanding which prediction result is more important for a certain sample improves interpretability?

Page 10, Line 191- please rephrase this sentence as it is confusing: "average correlation coefficient of each sample with its samples was calculated".

Page 10, Line 202 - should be RNA-seq not "RAN-seq". Typo.

Fig 1- Block spelling is wrong. The figure in general has a strange font.

Experimental design

Page 10, Differential Expression Analysis: The authors should use a Bonferroni or FDR-corrected p-value to account for multiple comparisons testing since we are dealing with multiple statistical comparisons here.

The authors should compare the performance of the transformer stack with some traditional ensembling techniques.

Validity of the findings

The authors should do a statistical comparison between the AUC values obtained using the different methods. From the numbers (all above 0.98) in Fig 6 it appears that there might not be statistically significant difference between the transformer stack and other models. This would preclude the need of performing transformer stack based ensembling of the ML model predictions.

The authors should double check the number of samples of each class in the different confusion matrices Fig 5. Some of the numbers are different. For example: the number of correctly classified in STAD class is 205 in RF but 211 in Transformer Stack although there are no misclassification (all other rows and columns are 0). Similarly BRCA is 42 in RF and 77 in Transformer stack without any misclassification.

Finally, since the authors claim that this transformer stack ensemble performs better than traditional consensus based ensembles, they should use a traditional ensemble as a baseline model. Here also it is important to show statistically significant differences between the performances.

It is unclear whether an independent test set was used. Please make this clear.

Fig 7 needs more points to ascertain the batch size at which cross-validation performance starts to improve.

Supplementary Table 3 shows 10-fold cross validation but paper says 5 fold cross validation was used. Please correct this mistake.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.