Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on March 20th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 7th, 2025.
  • The first revision was submitted on July 30th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • A further revision was submitted on September 17th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 9th, 2025.

Version 0.3 (accepted)

· · Academic Editor

Accept

The paper may be accepted.

[# PeerJ Staff Note - this decision was reviewed and approved by Yilun Shang, a PeerJ Section Editor covering this Section #]

**PeerJ Staff Note:** Although the Academic and Section Editors are happy to accept your article as being scientifically sound, a final check of the manuscript shows that it would benefit from further editing. Therefore, please identify necessary edits and address these while in proof stage.

·

Basic reporting

The manuscript is written in clear, professional English and presents a coherent narrative that situates the contribution within current work on adaptive agents for competitive fighting games. The paper defines all key constructs with precision and provides formal, reproducible specifications of the core evaluation metrics in Appendix A, including the Engagement Score, Opponent Adaptation Score, and Win Consistency Score, each given with explicit formulas and normalization ranges that support independent verification. The methods section offers a transparent description of the LSTM-with-attention network and the PSO-driven weight adaptation, with an architecture overview and layer configuration summarized in Figure 1 and Table 2, which together aid technical readability for a computational audience. The reporting of the emulator environment and API wrapper for Street Fighter II is sufficiently detailed to anchor replication, including state extraction, action discretes, and timing constraints. Raw material for reproduction is addressed through a reproducibility bundle with code and scripts that implement metric aggregation and analysis, which meets the journal’s expectation for data and code transparency. Presentation quality is high: figures are legible, captions are informative, and tables collect the most relevant quantitative comparisons, notably the comprehensive benchmark table and the ablation table. The literature review traces rule-based, DRL/HRL, and opponent-modeling approaches with appropriate currency and breadth, and the manuscript frames the study as empirical, which renders theorem-proof structure not applicable while still providing formal definitions where needed. Minor layout artifacts from PDF text extraction do not impede comprehension and can be addressed in production. On balance, basic reporting standards are fully met.

Experimental design

The study articulates a clear and relevant research question—whether PSO-optimized neural networks can deliver human-aligned, adaptive behavior in real-time fighting games—within the journal’s scope for original computational research. The design implements online, gradient-free PSO updates to neural weights during gameplay, with an explicit multi-swarm variant to promote exploration and avoid local minima; PSO control parameters and termination criteria are reported in tabular form, and sensitivity analysis is described to justify chosen settings. The neural policy, state-action representation, and decision policy are specified with dimensionalities, equations, and fixed ε-greedy behavior to enable faithful replication, alongside a common 28-feature state vector and ten-action discrete space shared across baselines. Real-time constraints are handled with two timing channels—sub-100 ms decision latency and asynchronous PSO optimization intervals—so the experimental platform reflects the practical demands of a high-tempo control loop. Human-subject evaluation enrolls sixty participants stratified by skill level, assigns randomized match order, and records only performance logs; ethical safeguards are documented with reference to minimal-risk standards and consent procedures. Comparative fairness is strengthened by normalizing budgets, data access, state encodings, and emulator integration across all baseline agents, thereby isolating algorithmic differences. Rating-system parameters are precisely stated for ELO and Glicko-2, supporting interpretability of longitudinal skill estimates. The replay buffer protocol, pruning logic, and 30% behavior-change trigger for re-invoking PSO are described in operational terms that a reproducer can implement without ambiguity.

Validity of the findings

The findings are supported by robust quantitative evidence, appropriate statistics, and transparent uncertainty quantification that match the study’s claims. The manuscript reports higher ELO and Glicko-2 ratings for the adaptive agent than for baseline agents and the human average, and it quantifies uncertainty with 95% bootstrap confidence intervals visualized in figures and described in text. A Wilcoxon signed-rank test on per-participant ELO deltas yields W = 15.0 with p = .031 and a matched-pairs rank-biserial effect size r = 0.42, which provides non-parametric support for the observed advantage under a within-subjects design. Composite engagement and adaptation metrics are reported near human levels for the proposed agent, with the engagement measure at 0.95 against a human reference near 0.98 and the adaptation measure at 0.91, both accompanied by bootstrap intervals. The benchmark table consolidates outcomes for all agents, and the ablation table demonstrates internal validity by showing that removing online PSO, collapsing to a single swarm, or dropping attention reduces ratings and custom metrics in expected directions. The caption and methods correctly distinguish decision latency from the PSO optimization interval so that readers do not confuse inference responsiveness with asynchronous weight updates. Limitations are candidly acknowledged—domain specificity to Street Fighter II, engineered features from an emulator API rather than raw perception, and non-blinded identification—which appropriately bound external validity without undermining the core conclusions. Availability of scripts and metric definitions further strengthens reproducibility of the statistical results. The conclusions are therefore well linked to the evidence and remain within the supportable scope of the analyses.

Additional comments

Only minor editorial refinements would further streamline the paper for archival clarity while leaving the scientific content unchanged. The production team can eliminate residual text-encoding artifacts in the PDF (for example, garbled diacritics that appear in a few places) and ensure consistent hyphenation for terms like “multi-swarm” and “real-time.” The caption to Table 5 already clarifies that the reported “Response Time” is the mean PSO optimization interval rather than the action-selection latency; adding a brief pointer to this distinction in the results text at first mention would preempt misinterpretation by casual readers. The data-availability statement might explicitly name the hosting location of the reproducibility bundle and, if feasible, include an anonymized version of the per-match logs to enable independent recomputation of the composite metrics. Visual summaries are strong; if space permits, a schematic of the experience-replay pruning policy could complement Figure 1 to reinforce how stale transitions are handled during long runs. The limitations section already sets appropriate boundaries on generalization claims, and the discussion convincingly frames practical significance in terms of engagement and adaptability rather than superhuman performance per se. No methodological additions are required for validity, and no additional experiments are necessary for acceptance.

·

Basic reporting

The text has been redrafted and does not present any formal problems, nor did I find any ambiguities.

Experimental design

I maintain my previous opinion; the topic is relevant.

Validity of the findings

I maintain my previous opinion; the topic is relevant.

Additional comments

I have no further comments. The document does not present any problems, and I believe it is valid for acceptance and publication.

Version 0.2

· · Academic Editor

Minor Revisions

Incorporate the comments of the reviewers.

·

Basic reporting

The manuscript presents a clearly scoped contribution to adaptive game AI, but several features of reporting require attention before the study can be considered methodologically transparent and fully self-contained. The prose is generally intelligible and uses field-appropriate terminology; however, recurring syntactic issues and occasional word choice errors reduce clarity and would benefit from professional copyediting (e.g., tense agreement, preposition use, and definitive articles in technical definitions). The literature review identifies core strands—finite-state controllers, DRL/HRL approaches, opponent modeling, and swarm methods—and situates the contribution plausibly within that landscape; nonetheless, the discussion of prior work sometimes collapses distinct problem framings (agent difficulty scaling versus human-alignment and engagement) and should better differentiate evaluation goals across cited studies. Figures and tables are legible and relevant, with Figure 1 outlining the hybrid LSTM-attention architecture and Tables 2–5 summarizing architecture and metrics; yet, several captions should specify data sources and computation details so that figures function as scientific evidence rather than illustration (e.g., the construction of the correlation heatmap and the bootstrap procedure underlying Figure 3 error bars). Definitions for bespoke metrics remain under-specified: “Engagement Score,” “Opponent Adaptation Score,” and “Win Consistency Score” are described qualitatively, but the exact formulas, normalization constants, and aggregation horizons are not provided, which prevents independent replication. Multiple internal inconsistencies must be reconciled in the text, notably the PSO coefficients reported as c1 = c2 = 1.5 near Equations 6–7 versus c1 = 1.5 and c2 = 1.7 in Table 3, and the claim of sub-100 ms action selection latency versus a “Response Time” of 2.3 s in Table 5; readers cannot interpret results unless latency constructs are defined and measured consistently. Raw data and code are referenced as available in the submission package; however, the review copy does not provide an accessible repository link, a data dictionary for the 28-dimensional state vector and 10-action encoding, plotting scripts for Figures 2–6, or the seeds/config files used to reproduce runs. The manuscript would pass basic reporting with: (a) a full metric appendix giving exact formulas and parameter values; (b) a reproducibility bundle containing code, config files, seeds, and plotting scripts; (c) harmonized notation and parameter values across text, tables, and algorithms; and (d) a copyedited revision that addresses recurring language issues.

Experimental design

The research question is well framed for PeerJ Computer Science standards: can gradient-free, multi-swarm PSO deliver online weight adaptation for a neural agent in a real-time fighting game, thereby improving adaptability and perceived engagement relative to strong baselines? The overall design—a hybrid sequence model with PSO-driven online weight updates evaluated against state-of-the-art agents and human opponents—is appropriate, but essential methodological details are missing or ambiguous in ways that currently prevent replication. The environment description identifies “Street Fighter” but does not specify game version, platform, frame rate, rendering pipeline, API for state extraction, or legality/EULA compliance for instrumentation—each factor can materially influence latency and action availability. The 28-feature state vector and 10-action set are named at a high level, yet mapping rules, feature scaling, and pre-processing are not operationalized; without a data dictionary and code, equivalence cannot be ensured across replications. The reward function (Equation 13) introduces weights w1–w3, but these coefficients and their tuning procedure are not reported; in adaptive control, reward shaping is performance-critical and must be documented. The ε-greedy exploration strategy is mentioned without schedule or annealing parameters; the replay buffer is described (size 100,000; pruning policy based on age and cumulative reward), but pruning thresholds and sampling policies need formal specification. The human-subject protocol uses a within-subjects design with randomized model order and 60 participants distributed by skill tier, which is a reasonable structure; however, inclusion/exclusion criteria, hardware controls (display latency, input devices), compensation, and demographic descriptors are not reported. Ethical oversight is presented as a self-disclosure with informed consent; PeerJ readers will expect either a formal IRB/non-human subjects determination or an institutional ethics waiver, plus a brief justification aligned to minimal-risk criteria. Finally, the evaluation stack (ELO/Glicko-2) requires parameter reporting (K-factor, τ, initial RD, volatility), and the provenance of comparison agents (original implementations versus re-implementations) should be clarified to avoid unfairness from mismatched training or integration conditions. Addressing these issues will raise the investigation to a high technical and ethical standard with sufficient information for replication.

Validity of the findings

The results section reports favorable values across ELO, Glicko-2, engagement, adaptation, and consistency, but several analytic choices currently limit inferential validity and should be strengthened.

First, the correlation heatmap aggregates one row per agent and then computes Pearson coefficients across 5–6 aggregate points per metric; with such a small N and inter-metric dependencies, correlation magnitudes near ±0.97 are not interpretable statistically and should not be used to support substantive claims about engagement dynamics.

Second, the Wilcoxon signed-rank test (W = 15.0, p = .031) is reported without a clear statement of the paired structure, unit of analysis, or sample size; if the test compares per-participant ELO deltas, then the number of pairs and any multiple-comparison corrections should be stated, and effect sizes (e.g., matched-pairs rank-biserial r) should be added.

Third, the response-time narrative is inconsistent: the methods section asserts sub-100 ms action selection with asynchronous PSO updates during play, while the results table reports a 2.3 s “Response Time”; the construct must be defined (decision latency versus periodic optimization interval) and measured with event-time instrumentation that is uniform across agents.

Fourth, bespoke metrics (“Engagement Score,” “Opponent Adaptation Score,” “Win Consistency Score”) require exact formulas, windows, and reference distributions; otherwise, comparisons to external agents cannot be audited.

Fifth, uncertainty is presented for ELO via bootstrap error bars in Figure 3, which is appropriate; similar interval estimates should accompany the bespoke metrics to avoid over-precision in single-value comparisons.

Sixth, ablations are needed to establish internal validity: PSO versus no-PSO, single-swarm versus multi-swarm, with and without attention, and sensitivity to c1/c2/inertia values—especially given the reported discrepancy for c2 across text and Table 3.

Seventh, the fairness of baseline comparisons should be documented: training budgets, data access, and integration constraints for Shukai, Brisket, FightLadder, and human-agent collaboration models must match the proposed system as closely as possible. With clarified constructs, complete metric definitions, uncertainty quantification, and ablations, the conclusions will be appropriately limited to supporting results and will meet PeerJ validity standards.

Additional comments

Several constructive additions would materially improve clarity and community value. A reproducibility package should include executable code, dependency pins, seeds, configuration files for PSO and the network, and scripts that regenerate Tables 2–5 and Figures 1–6 from raw logs; a README should document how to acquire or emulate the game environment legally. The paper would benefit from an explicit computational budget section: wall-clock overhead for PSO updates per interval, GPU/CPU usage, and memory footprint, along with a discussion of scalability to longer bouts and other genres. The architecture section could add a compact diagram of the decision loop and optimization loop with time budgets, clarifying precisely when replay sampling, PSO updates, and action selection occur. The metric appendix should define the three bespoke scores mathematically, include pseudocode for their computation, and justify construct validity by relating each to established human-factors or human-AI interaction literature. The human-study section should report participant recruitment, consent, compensation, hardware standardization, and any exclusion criteria, accompanied by an IRB letter or ethics waiver number. The writing would benefit from unified notation (e.g., consistent vector bolding and subscript conventions), consistent parameter reporting (c1/c2/inertia; initial weight/velocity ranges, which currently appear asymmetrical), and consistent terminology for adaptation intervals during and between rounds. Finally, a limitations paragraph should acknowledge potential threats to validity—domain specificity to one franchise, reliance on engineered features rather than end-to-end perception, and the absence of blinded identification—and outline how future work will address those constraints.

·

Basic reporting

The authors provide adequate field background and contextualization through an interesting review of relevant literature, which situates their contribution within current research on adaptive AI, reinforcement learning, and particle swarm optimization for gaming environments.
The paper follows a professional structure: abstract, introduction, related works, methodology, results, discussion, and conclusion. Figures and tables are effectively employed. Including raw data in structured tables, alongside detailed algorithmic descriptions, enhances reproducibility and transparency.
The manuscript is self-contained, with results that directly address the stated hypotheses. The findings are consistently tied to the research objectives, particularly regarding adaptability, engagement, and strategic diversity in fighting games.
The results section defines all essential terms, mathematical formulations, and algorithmic steps. Detailed proofs and derivations support theoretical and computational processes, ensuring rigor and clarity. Also, the methodology provides sufficient detail to allow replication by other researchers, demonstrating strong alignment between theoretical underpinnings and experimental validation.
But I did find some formal issues:
The phrase indicating the abbreviation for:
deep reinforcement learning methods (DRL)
reinforcement learning (RL)
HRL (Hierarchical Reinforcement Learning)
are repeated during the text!
Table 1 appears without any reference in the body of the text; this only happens at the end of section 2.2
Table 1 mentions the author Halina & Guzdial (2022) ... however, she is never referred to in the body of the text except in sections 4 and 5
Table 1 mentions the author Gao et al. (2024), but this is only highlighted after the table appears.
The way the authors are cited in the introduction is strange. It should be indicated, for example,
instead of strategies to beat their Mendonca et al. (2015). It should be strategies to beat their (Mendonca et al. 2015).
In section 2.2, instead of system by (Zhang et al., 2024) it should be system by Zhang et al. (2024)
in section 3, the description of LSTM appears for the first time, without any prior explanation of its meaning
the explanation only appears at the end of page 6 “challenge, a Long Short Term Memory (LSTM)”
table 4 appears without any reference in the body of the text; this only happens at the end of section 3.5
Section 3.5 refers to a study conducted with 60 people, but does not indicate when and where it was conducted, how these 60 people were chosen, or provide any characterization.
Since these 60 people were only rivals, what was their prior knowledge of the game or prototype that was used?
In section 4, Particle Swarm Optimization (PSO) is repeated, i.e., the explanation of the abbreviation.
In line 482, the table number is missing.

Experimental design

The manuscript presents original primary research that falls clearly within the aims and scope of the journal. The research question is well defined, relevant, and meaningful, addressing the development of adaptive AI for fighting games. The authors explicitly state how their approach fills an identified knowledge gap: the lack of real-time adaptable AI systems capable of dynamically adjusting strategies against diverse human playstyles.
The study is conducted rigorously and meets a high technical and ethical standard. The experimental design, including testing with human participants, follows accepted ethical principles, with informed consent procedures outlined. The investigation demonstrates strong methodological robustness, combining particle swarm optimization with neural network adaptation in a novel manner.
The methods are described in sufficient detail to ensure replicability. Algorithmic steps, mathematical formulations, neural network architecture, and optimization procedures are clearly presented. Including figures, tables, and precise hyperparameters further supports reproducibility and transparency, enabling other researchers to replicate and extend the work.

Validity of the findings

The manuscript does not explicitly assess its broader impact and novelty beyond the presented scope. While the technical contribution is evident, the authors could strengthen the work by clarifying its potential implications and positioning its novelty more directly within the literature. Meaningful replication of this research is encouraged, provided that the rationale and benefits to the field are clearly articulated, as this would reinforce the robustness and generalizability of the findings.
All underlying data have been provided through detailed evaluation metrics, statistical analyses, and controlled experimental results. The data appear robust, statistically sound, and appropriately managed, offering transparency and reliability.
The conclusions are clearly stated, well-connected to the original research question, and remain appropriately limited to the results obtained. The authors avoid overstating their claims and ensure that the discussion remains firmly grounded in the supporting evidence generated by the study.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Reviewer 1 ·

Basic reporting

Some clarifications about technical details, especially involving RL limitations, should be improved for reader comprehension.

The related work section is good but lacks specific elaboration on why standard online Reinforcement Learning (RL) methods aren’t suitable for real-time fighting games. The authors mention RL methods being slow to adapt but don’t sufficiently detail the technical reasons behind this limitation. Recommend adding 1-2 paragraphs clearly explaining why online RL is inadequate and why PSO provides a better solution.

Figures are of good quality but their captions are insufficiently detailed. Specifically:
• Fig. 2 (Correlation heatmap): Currently lacks context as to why these correlations matter. Recommend explaining explicitly how the correlation between metrics affects adaptive AI performance.
• Figs. 3, 4, and 5: Require richer descriptions that clarify the axes, interpretation, and key takeaways in 2-3 sentences each.

The manuscript meets the requirements, as raw CSV data has been provided.

Formal definitions and explanations of key metrics are provided clearly. However, the inclusion of more explicit descriptions of methods, such as Glicko-2 ratings and ELO calculations, would further enhance transparency.

Experimental design

The topic of adaptive AI in real-time competitive gaming clearly fits the scope of the journal and represents original primary research.

The research question is clearly defined, relevant, and meaningful. The authors clearly state the knowledge gap: existing models lack adaptability and real-time strategic changes.

The experimental methods are technically sound. However, the ethical aspects, particularly regarding human subjects, are insufficiently explained:
• Human Study Procedure: It mentions 60 players each played 10 matches against each AI model but does not clearly explain if each player encountered all AIs (within-subjects) or different subsets of AIs (between-subjects).
• Randomization and Blinding: No details are provided about match order or randomization strategies. It’s unclear if the subjects were blinded regarding which AI model they faced.
• Ethics Approval and Consent: No IRB or ethics approval ID is mentioned, nor are the procedures for informed consent explained. These need to be explicitly detailed.

Validity of the findings

Underlying data is provided, robust, and statistically sound, though authors could enhance statistical rigor by:
• Including 95% Confidence Intervals (CIs) explicitly for Fig. 3 (ELO scores).
• Conducting and clearly reporting a non-parametric statistical test (e.g., Wilcoxon signed-rank test) to validate ELO rating differences versus human baseline.

The authors provide an “engagement score” scaled from 0 to 1 but don’t sufficiently justify or validate why this particular scaling was used or its source. Recommend adding a short justification, referencing relevant literature or validation studies.

The conclusions are clearly stated and appropriately linked to the original research questions. The authors successfully avoid overgeneralizing their results beyond the specific domain of fighting games.

Additional comments

This manuscript is promising, presenting a novel adaptive AI approach combining PSO with neural networks, well-positioned within the scope of competitive gaming research. However, key areas must be clarified:
• Literature and RL Clarification: Add explanations detailing RL limitations.
• Figure Captions: Improve descriptive clarity.
• Human Subject Study Details: Provide clear procedures, randomization strategies, blinding methods, and ethical approval details.
• Statistical Clarification: Include Confidence Intervals and appropriate statistical tests clearly.
• Engagement Metric: Provide justification for the used scaling.

Reviewer 2 ·

Basic reporting

The paper goes over using PSO as an adaptive way to improve fighting game AI. The paper is clear, especially in the introduction and explaining the target, but I think it goes into too much detail with the methods and gets less details and clarity in the results section. My main issue is that the paper uses so many metrics from other papers that it doesn't bother to explain them. I think for the benefit of the reader, the authors should explain all of these metrics. Also, all the results figures need to be better, as it is conveying too much information in a very small space. Captions are not great for all the figures and tables.

Experimental design

The experiments are a bit vague, as it is not clear when the AI modifies itself, is it every frame, or is it in a separate thread and updates the network? The reason for these questions is mainly that the response time is 2.3 seconds, which is still big and can be comparable to active learning in RL. Also, how does the human interact with the systems, as in Table 5, it felt that they were baseline, while in the experiments, it was saying that humans are evaluating the system itself. It is just unclear about that.

Validity of the findings

About the finding, although it sounds interesting, there is no statistical significance. There is also no human feedback on how this AI is different from others. How did humans work with the AI, and if it was actually making changes that made sense? Also, is the starting policy pretrained, or do we always start from scratch?

Additional comments

Finally, the paper said that using PSO as a way to update policy is new in games. There are multiple older works since 2004 in games like TicTacToe and other games.

For example: Learning to play games using a PSO-based competitive learning approach

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.