All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I am pleased to inform you that your manuscript has been accepted for publication following the second round of peer review.
The reviewers have recognized the significant improvements made in response to their feedback, particularly in enhancing clarity, experimental rigor, and theoretical justification. Your revisions have greatly strengthened the manuscript, making it a valuable contribution to the field.
[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]
The revised manuscript now presents a clear, thorough, and well-structured exposition of the proposed tennis auxiliary teaching system. The authors have addressed prior ambiguities by providing a formal Markov Decision Process formulation for the RL-based keyframe extraction, detailed integration steps between the keyframe module and PoseC3D, and explicit specifications for the genetic algorithm layer. The language has been substantially improved through comprehensive proofreading, and the overall academic tone is more consistent. Figures, tables, and references are relevant and up-to-date, with recent works on human action recognition and reinforcement learning included. The inclusion of ablation results, dataset descriptions, and comparative benchmarks enhances transparency and reproducibility.
The experimental design has been significantly strengthened. The dataset has been expanded from 80 to 210 clips, covering diverse strokes, skill levels, lighting conditions, and camera perspectives, with validation against an external benchmark (TAB-100). The methodology now clearly explains preprocessing pipelines, training protocols, hyperparameters, and feature fusion strategies. The ablation studies effectively isolate the contribution of each module (RL keyframes, GA fusion, PoseC3D fine-tuning, DTW scoring), and benchmarking against ST-GCN provides context within the state of the art. The authors have also acknowledged limitations such as potential overfitting, latency constraints, and performance sensitivity to motion blur.
The findings are now well-supported by quantitative and qualitative evidence. Reported accuracy, F1-scores, and correlation with expert scoring demonstrate robust performance gains from the proposed hybrid architecture. The ablation results convincingly show the necessity of each system component, while external validation confirms generalizability beyond the primary dataset. The discussion addresses practical considerations for real-time deployment and acknowledges areas for future improvement. Overall, the results appear valid, reproducible, and aligned with the stated objectives.
The authors have been responsive to reviewer feedback, providing detailed and relevant revisions that resolve the major concerns raised in earlier reviews. The work makes a valuable contribution to intelligent sports coaching systems by integrating reinforcement learning, evolutionary feature fusion, and deep pose-based classification in a cohesive pipeline. Given the thorough revisions, enhanced clarity, robust experimental validation, and clear acknowledgment of limitations, I recommend acceptance of the paper in its current form.
.
.
.
.
Dear authors thank you for your submission, your manuscript has been reviewed and reports are received , please carefully revise the paper in light of these comments and re-submit
AE Comments: 
•	Avoid repeating “to accurately recognize and evaluate tennis movements” in the opening sentence.
•	Break down the first long sentence into two for improved readability.
•	Correct spacing errors:
o	“human pose -recognition” → “human pose-recognition”
o	“Second , genetic algorithms…” → remove the space after “Second”
•	Clarify how reinforcement learning is integrated with keyframe selection.
•	Specify what types of features are fused using the genetic algorithm (e.g., temporal, spatial, kinematic).
•	Mention the dataset used to achieve the 98.45% classification accuracy.
•	Explain the experimental setup briefly for context.
•	Quantify the performance gain over AGCN and ST-GCN models if possible.
•	Include citation or benchmark data to support the claim.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
The manuscript lacks a clear mathematical formulation of the reinforcement learning (RL) process used for keyframe extraction. Crucial elements such as the state space, action space, and reward function are only vaguely described. To enhance transparency and reproducibility, a formal Markov Decision Process (MDP) representation is strongly recommended.
While PoseC3D is referenced as the core module for motion classification, its integration with the upstream keyframe extraction module remains unclear. The paper should clarify the input format (e.g., skeleton sequence or raw image frames), whether the model was pretrained or trained from scratch, and provide visual examples or class-wise confusion matrices to illustrate model behavior.
The terminology in figures and throughout the text is inconsistent. For example, “keyframe” and “key action” are used interchangeably. Figures such as Figure 3 lack sufficient caption detail (e.g., what is being optimized or evaluated). Captions should be revised to be self-contained and informative.
Grammatical issues persist throughout the manuscript. For instance, “the system is helps coach to judge...” is incorrect and should be revised. A thorough language proofreading or professional editing is recommended prior to resubmission to improve overall readability and academic tone.
The paper introduces a hybrid system combining reinforcement learning, genetic algorithm-based fusion, and deep learning-based classification, but does not conduct ablation studies to assess the independent effectiveness of each component. It is strongly recommended to include comparative experiments where components (e.g., GA fusion) are removed or replaced with simpler alternatives (e.g., averaging) to quantify each module's contribution.
The experiment is based on a dataset of only 80 video clips, which is insufficient for training and evaluating a multi-stage system with high generalization requirements. Furthermore, the composition of the dataset—such as stroke diversity, player skill level, lighting variation, and camera conditions—is not clearly detailed. Authors should consider expanding the dataset or evaluating on an established external tennis dataset to validate robustness and generalizability.
There is no discussion of the computational cost of the system. Given its potential application in real-time instructional settings, it is important to report average processing time per frame or per video to assess real-time viability.
The current results do not demonstrate whether each pipeline component is essential or optimal, due to the lack of ablation experiments.
There is no quantitative evaluation of the system’s teaching impact—for example, whether model outputs (scoring or classifications) improve learner performance or provide meaningful guidance to coaches. This is a critical limitation, as the paper is positioned around a teaching assistant system.
Additionally, there is no feedback mechanism described. In a teaching context, it is essential to explain how system outputs translate into feedback, such as real-time corrections, suggested drills, or performance summaries. This pedagogical loop must be clearly defined for the system to function as a teaching assistant.
The paper presents an interesting interdisciplinary blend of machine learning methods for sports instruction. However, the integration strategy across modules (RL, GA, and PoseC3D) should be better explained.
To increase clarity and scientific contribution, the authors are encouraged to:
Formally define the reinforcement learning framework (states, actions, rewards).
Include ablation studies to assess the effectiveness of the RL and GA components.
Provide implementation details of PoseC3D: training status, input type, and sample outputs.
Expand or supplement the dataset and disclose dataset diversity.
Add real-time inference metrics and a clear pedagogical feedback mechanism to support teaching claims.
With these improvements, the manuscript could offer a stronger contribution to the fields of AI in sports training and human action understanding.
.
.
.
This study explores an auxiliary teaching system for tennis training by integrating pose recognition, keyframe extraction, and feature fusion techniques. The application context is promising, and the system-level thinking is commendable. However, there are still several moderate-to-major issues to address before the paper can be considered for publication.
	Section 3.2 mentions multi-feature fusion based on genetic algorithms, but provides no detail on the encoding strategy, fitness function, mutation/crossover rates, or convergence criteria. Clarify whether this is a heuristic layer or an optimization layer.
	The system is not benchmarked against prior works in intelligent sports coaching or action recognition (e.g., OpenPose-based systems or wearable sensor-based approaches). Comparative performance would better justify the contribution.
	While the GUI is briefly shown, how a student or coach interacts with it (e.g., receive a score, compare with standard movement) is not well explained. Consider describing a sample user journey.
	Some of the images (e.g., tennis player pose sequences) are low resolution and hard to interpret. Consider replacing rasterized images with scalable vector diagrams or annotated snapshots.
	The paper lacks a critical discussion of its current limitations. For example, the small dataset, model overfitting risk, or system latency in real-time settings should be acknowledged.
	Several references are older or review-based. Add citations of recent papers on human action recognition using transformers or real-time pose estimation models relevant to teaching aids.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.