Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on July 11th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on September 8th, 2025.
  • The first revision was submitted on September 25th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 13th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

Thank you for your resubmission. I am pleased to inform you that your manuscript is being recommended for publication.
Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Reviewer 2 ·

Basic reporting

The paper is improved

Experimental design

A wel established and updated design with improved performance

Validity of the findings

The validity of the findings is state of the art and novel.

Additional comments

My Comments has been incorporated and i m happy to share that i have no more comments.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

Dear authors,

Your manuscript has been reviewed by the experts in the field and you will see that they are advising against the publication of paper in its current form and suggesting a number of changes to be incorporated. Therefore, we invite you to carefully update your paper in light of these comments and re-submit.

AE Comments:
The framework for real-time human behavior detection during online dance classes is groundbreaking. Its use of evolutionary algorithms and deep learning skeleton models achieved 95.47% accuracy detecting behavior in the UTKinect dataset with less than 0.08 seconds per frame, under 0.6 seconds in total, with a latency of less than 0.6 seconds. That’s impressive. Still, the approach could benefit from a description of the specific evolutionary algorithm and its parameters (for instance: population size, mutation rate) as well as a described methodology for occlusions, mild interference, and poor lighting conditions an approach.

The fusion of temporal and spatial features (torso angle, joint positions, limb velocity) into a frame-level expansion vector lacks clarity on weighting or normalization techniques, and supplementing the UTKinect dataset with a custom or diverse dataset (e.g., NTU RGB+D) would strengthen generalizability, especially for Pakistan’s varied educational contexts. Additionally, specifying hardware requirements for scalability on lower-end devices, providing an error analysis for misclassifications, and briefly outlining adaptations for applications like rehabilitation or surveillance would boost practical utility; minor improvements in grammar (e.g., capitalizing "Long Short-Term Memory") and a visual diagram of the framework would further enhance readability and comprehension.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

·

Basic reporting

no comment

Experimental design

1. Although the research gap has been well identified, namely dealing with occlusion, unfortunately there are still few references that support this idea. This weakens the argument and urgency of the research.
2. At the end of the background section, the author offers a solution using the Long Short-Term Memory (LSTM) method. However, it should be noted that: (a) LSTM is a widely used method, so the author needs to demonstrate specific developments or innovations in its application within the context of this research. (b) In subsection 3.2 on the Human Action Recognition Process, the author explains that the method used is Random Forest, which is inconsistent with the previous statement regarding the use of LSTM. This inconsistency needs to be addressed to make the research methodology clearer and more focused.
3. Information regarding the structure of the LSTM layers and the parameters used is not presented in the manuscript. However, these details are important for explaining the architecture of the model being built.

Validity of the findings

1. The model training process is also not shown, making it hard for readers to judge the quality and learning stages of the model developed.
2. The research results aren't presented properly. A comparison with previous research is suddenly made without first presenting the results of the experiments conducted by the authors themselves.
3. The attached testing video does not show someone dancing, making it irrelevant to the research context regarding online dance classes.
4. The conclusions presented lack a strong foundation as they are not supported by experimental data or adequate discussion.

Additional comments

-

Reviewer 2 ·

Basic reporting

In order to improve online dance instruction, this article introduces a real-time human behavior recognition framework that can precisely identify and assess students' movements. To increase recognition accuracy and generalization, the system makes use of a deep learning model based on skeletons combined with evolutionary optimization. The framework uses continuous video input from a CSI camera module to extract skeletal joint coordinates and compute hybrid temporal-spatial features, including limb velocities, joint positions, and torso angles. The paper is good, but it needs to incorporate the following significant changes.

Experimental design

The evaluation is limited to the UTKinect dataset, which is relatively small and lacks diversity in viewpoints and complex real-world scenarios.
 The reported real-time latency of 0.6 seconds is promising; however, performance across different hardware configurations and low-resource devices remains unexplored

 The manuscript claims that the evolutionary algorithm improves feature representation and classification structure but lacks details on its operational mechanism. How does the evolutionary process interact with the LSTM training loop? Are network parameters, feature weights, or hyperparameters being optimized? A clear algorithmic flow and pseudo-code would greatly enhance clarity and credibility.
 Since LSTM is used as the core classifier, a baseline model without evolutionary optimization should be included to isolate the impact of the algorithmic enhancement.
 Figures 3–6 are insufficiently labeled. Add descriptive captions, axis labels, and units (e.g., accuracy %, latency in seconds).
 The real-time latency (0.6s) is reported, but there is no profiling breakdown. Indicate time spent on joint detection, feature extraction, and classification.

Validity of the findings

Using a single dataset limits generalizability. If other datasets are unavailable, discuss this limitation and justify why UTKinect is sufficient.
 Combining skeleton features with LSTM is not new. The contribution lies in using evolutionary optimization, which needs clearer algorithmic detail.
 Many equations are shown without variable definitions or derivation context (Eq. 1–6). Provide a symbol table or inline definitions.
 Only eight behaviors with two repetitions each are insufficient for robust evaluation. Clarify total frame count and participant diversity.

Additional comments

the authors need to improve the language of the manuscript.

Reviewer 3 ·

Basic reporting

Paper Summary:

The paper proposes a framework for recognizing dance actions in real-time. It uses a camera to capture video, extracts skeleton joint data, and computes features like joint angles and velocities. These features are fed into an LSTM network for classification. The authors mention using an evolutionary algorithm for optimization but do not describe it. Experiments on a subset of the UTKinect dataset show high accuracy and low latency on a Jetson Nano device.

Strengths:

+ The paper is easy to follow.

+ The focus on using AI for real-time sports instruction, specifically dance, is a compelling and socially valuable application. This interdisciplinary approach has clear potential for real-world impact.

+ The demonstration of the system running on an embedded platform with low latency is a practical strength, relevant for applied computing.

Concerns:

- Novelty: The core methodology is a well-established pipeline in computer vision i.e., skeleton extraction -> hand-crafted features -> LSTM. The work does not identify a specific, unsolved challenge in this domain nor propose a novel algorithm or theory to address it.

- Comparisons: The choice of baselines for comparison is a critical flaw. The field of skeleton-based action recognition has advanced significantly, with modern methods like CTR-GCN [1*], PoseCov3D [2*], SkateFormer [3*] becoming standard benchmarks. Comparing against older methods (from 2016-2018) does not convincingly demonstrate the effectiveness of the proposed approach.

- Experimental Analysis: The evaluation lacks depth. There is no ablation study to show the contribution of key components (e.g., What is the evolutionary algorithm actually improving? How important is each hand-crafted feature?). The analysis does not discuss why the method works or fails in certain cases, what the limitations are, or under what scenarios it would not perform well. This misses the chance to provide scientifically valuable insights.

- Methodological Details: The evolutionary algorithm (EA) is mentioned as a key component but is not described at all. Without details on the type of EA, what is being optimized, and the fitness function, this core claim is unverifiable.

- Writing and Presentation: The language requires improvement to meet academic standards. Many sentences are unclear or awkwardly phrased, which hinders understanding. The figures are also simplistic and lack professional presentation.

Overall, the authors need to address the fundamental issues of novelty and rigorous comparison to make this manuscript suitable for publication.

References:
[1*] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. In ICCV, 2021.
[2*] Revisiting Skeleton-based Action Recognition. In CVPR, 2022.
[3*] SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition. In ECCV, 2024.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

Experimental design

The experiments are well designed, however, deeper analysis and comparisons with more recent state-of-the-art are needed. See the "Concerns" in Basic reporting.

Validity of the findings

The claimed novelty of integrating an EA with a skeleton-based LSTM is modest. While the application context is specific, the core methodological combination is not groundbreaking. The manuscript does not demonstrate a clear advancement over the state-of-the-art. See the "Concerns" in the Basic reporting.

Additional comments

No Comment.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.