Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on August 5th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on September 24th, 2025.
  • The first revision was submitted on October 10th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 30th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

Reviewers agree that all previous issues have been resolved, particularly their comments about methodology and experimental design. Reviewer 3 would have liked to run more randomized trials, but agrees with the design. Reviewer 3 also recommends checking the citation formats; remember that the PeerJ citation style is a specific author–date referencing format similar to the Harvard style.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

·

Basic reporting

The manuscript is well-structured and clearly written, fulfilling the Basic Reporting criteria outlined by the journal.
All previously noted minor issues have been addressed; specifically, the misprint of the algorithm name ("PRO" has been corrected to "PPO"), and the discrepancies between figure captions and in-text citations have been systematically resolved. The figures are now sequentially numbered (Figures 7–14) and correctly correspond to their textual mentions, ensuring consistency and readability. This thorough revision has improved the manuscript's presentation quality and academic reliability. The English throughout the paper is clear, technically accurate, and consistent with scientific writing standards.

Experimental design

The experimental design is rigorous and well-justified. The authors effectively demonstrate the generalization performance of the proposed reinforcement learning model across various scenarios. The inclusion of multiple environments and comparative evaluations underscores the robustness of the proposed methodology. The repository is intuitively organized, and file naming is consistent with the methods described in the paper. Furthermore, the new hyperparameter table presents key details such as learning rate, batch size, discount factor (γ), and weight decay values. This addition enables readers to replicate and verify the reported results, following best practices for experimental transparency.

Validity of the findings

The results are robust and convincingly support the conclusions drawn in the paper. Statistical measurements and performance metrics—such as mean return, convergence rate, and standard deviations across test runs—are properly analyzed and interpreted. The manuscript effectively compares the proposed PPO-based approach against baseline algorithms, providing a balanced and evidence-driven analysis.
The discussion section integrates quantitative insights with theoretical explanations, linking experimental outcomes to the conceptual contributions presented earlier. The authors also acknowledge potential limitations, including possible overfitting in certain environments and the necessity for broader validation using different model architectures. These self-critical reflections enhance the academic integrity of the study. Overall, the conclusions are valid, consistent with the presented data, and demonstrate the method's contribution to reinforcement learning research.

·

Basic reporting

A. Overall Assessment
The authors have made substantial revisions addressing nearly all major and minor comments raised in Round 1. The manuscript now shows clear improvements in readability, methodological transparency, and experimental validation. The responses are detailed and constructive, demonstrating the authors’ commitment to improving the work. The revised paper is now much closer to being publishable, with only a few minor editorial and structural refinements suggested below.

B. Evaluation of Authors’ Responses
1) Issue: Language and academic tone
Authors’ Response: Paper polished with the help of native English-speaking researchers.
Assessment: Addressed – The manuscript reads more clearly and professionally. Minor stylistic editing by the journal may still be helpful.

2) Issue: Missing or outdated literature (2015–2017)
Authors’ Response: Added recent references (2022–2025) related to DRL, 3D environments, and decision transformers.
Assessment: Addressed – Background section now more current and relevant.

3) Issue: Missing strong baselines (e.g., Rainbow, IMPALA)
Authors’ Response: Added Recurrent Rainbow as a new baseline algorithm and discussed results comparatively.
Assessment: Addressed – Strengthens benchmarking.

4) Issue: Clarity of figures and legends
Authors’ Response: Improved labeling, ordering, and captions of Figures 7–14.
Assessment: Addressed – Figures are now more consistent and readable.

5) Issue: Overstated claims ('far exceeds human players')
Authors’ Response: Language revised for accuracy and restraint.
Assessment: Addressed – Tone is now appropriate.

6) Issue: Discussion of novelty
Authors’ Response: Added discussion highlighting Gunner’s three main contributions (role of dueling networks, deeper ResNet benefits, LSTM vs Transformer comparison).
Assessment: Addressed – Clearer articulation of contribution.

Minor suggestions:
1. The newly added sections (especially Additional Experiments and Discussion) could benefit from final editorial polishing for smoother transitions.
2. Ensure all new references follow PeerJ’s citation format consistently.
3. The Recurrent Rainbow comparison should explicitly report the performance gap numerically (if not already included).
4. Ethical and Reproducibility Considerations
No ethical or plagiarism issues detected. Code availability and clear hyperparameter tables enhance reproducibility. The addition of random-seed experiments improves robustness and transparency.

Experimental design

4) Issue: Lack of reproducibility (no hyperparameter table, no code)
Authors’ Response: Added Table 2 listing all key hyperparameters; provided code and training scripts to the editor for open release post-acceptance.
Assessment: Addressed – Sufficient for reproducibility.

5) Issue: Limited robustness analysis (single-seed runs)
Authors’ Response: Conducted 3 randomized runs for Gunner and Gunner(ResNet); analyzed average performance and variance.
Assessment: Partially addressed – The added runs strengthen results, though limited to selected algorithms due to computational constraints.

Validity of the findings

The writing quality has improved significantly. The methodology is more transparent, and the results and discussion now clearly align with the stated objectives. The paper demonstrates an incremental but valuable contribution to benchmarking DRL algorithms in 3D game environments.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

I suggest a thorough reading of the comments and suggestions from both reviewers in order to revise and improve the current version of the paper

**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Reviewer 1 ·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Cite this review as

·

Basic reporting

Overall, the manuscript is well written and meets the Basic Reporting criteria. The background is adequately referenced, and the narrative and figures are clear. Two minor editorial fixes are needed:

(1) Algorithm name typo — “PPO” is misspelled as “PRO” (e.g., p.10); please correct it to “PPO.”

(2) Figure numbering/order — some figure captions and in-text citations are misaligned; please reorder the figures and update all in-text references accordingly. With these changes, the Basic Reporting standard will be satisfied.

Experimental design

Evaluating generalization across multiple scenarios is a strength. For reproducibility, please provide

(1) the code/repository link for training and evaluation (including run scripts/configs) and

(2) a concise hyperparameter table (e.g., learning rate, batch size, γ, target update period/τ, replay buffer, optimizer/weight decay).

Validity of the findings

Across some scenarios, Gunner(ResNet) appears to generalize better than the full Gunner model. Please add an explanation for this reversal to strengthen the conclusions—for example:

(i) potential overfitting due to parameter count/regularization,

(ii) interactions between the LSTM/memory module and the reward/preprocessing,

(iii) differing reliance on engine-derived features. We also recommend a targeted Gunner vs Gunner(ResNet) ablation under identical settings with 3–5 seeds (reporting mean ± SD/CI) and a simple effect size or nonparametric test.

·

Basic reporting

- The manuscript is generally well structured, following the standard format of abstract, introduction, methodology, results, discussion, and conclusion.
- English usage is understandable, but the manuscript would benefit from language polishing. Some sentences are lengthy or awkwardly phrased, which can make comprehension difficult for an international audience.
- Figures and tables are appropriate, clearly labeled, and relevant. Raw data and results are sufficiently described.
- Literature review is extensive, covering both classical DRL algorithms (DQN, Rainbow, A3C, DRQN) and recent advances (IMPALA, Transformers, SPR, MuZero). However, some references are dated (e.g., citations around 2015–2017 dominate). Including more recent works (2022–2025) in DRL for 3D environments (e.g., decision transformers, memory-efficient RL, or hybrid model-based/model-free methods) would strengthen the background.

Experimental design

- The research question—whether advanced architectures (ResNet, LSTM, Dueling and Noisy Networks) combined in the proposed “Gunner” agent improve RL performance in Doom—is clear and relevant.
- The study is original in its combination of DRL techniques and testing across multiple Doom scenarios (Deathmatch, Health Gathering Supreme, Defend the Center).
- Methods are described with reasonable detail, but some hyperparameter settings (e.g., learning rate, batch size, replay buffer size, exploration parameters) could be specified more systematically for reproducibility.
- The choice of baselines (DQN, DRQN, Transformer variants) is appropriate, but comparison to Rainbow or IMPALA directly would provide stronger benchmarking, since they are closer in spirit to Gunner.

Validity of the findings

- The experiments are thorough, covering three Doom scenarios with both performance (Frags, kills, survival time) and stability measures.
- Results convincingly show that Gunner outperforms DQN, DRQN, and Transformer variants. Statistical summaries (Table 3) are provided, including confidence intervals.
- However, variance in training outcomes could be better analyzed. For example, repeating runs with different seeds would help assess robustness.
- The conclusions are mostly supported by the data, but some claims (“far exceeds human players,” “highly competitive level”) should be toned down or backed with explicit comparison to human benchmarks.

Additional comments

4. General Comments
Strengths:
- Well-motivated problem (3D RL vs 2D Atari).
- Systematic experimental evaluation in multiple environments.
- Clear architecture description of Gunner, with modular components (ResNet backbone, LSTM memory, Dueling + Noisy networks).
- Good integration of past DRL research.

Weaknesses / Suggestions:
1. Language clarity: Improve English expression and sentence flow.
2. Reproducibility: Provide a complete table of hyperparameters and details of training procedures (learning rate schedule, replay buffer size, gradient clipping, etc.).
3. Baselines: Consider comparing against Rainbow and/or IMPALA directly. These are strong references in scalable RL.
4. Statistical rigor: Multiple training runs with different seeds would strengthen confidence in results.
5. Figures: Some plots (e.g., Fig. 7–10) could benefit from clearer legends and error bars.
6. Discussion: The discussion could be expanded to highlight how Gunner contributes beyond being a combination of existing techniques. What unique insight does it bring?
7. Recent references: Add more citations from the last 2–3 years on DRL in 3D environments and Transformer-based RL.

Decision: Major Revision
The paper presents promising results and a well-designed algorithm. However, improvements in writing clarity, reproducibility, and benchmarking against stronger baselines are necessary before publication.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.