Angle information assisting skeleton-based actions recognition

View article
PeerJ Computer Science

Main article text

 

Introduction

  • We propose a cosine stream, which quantifies the joint range of motion between two bones in degrees, to assist in action recognition with significant displacement.

  • We have enhanced the existing downsampling algorithm by integrating the keyframe concept. This enhancement yields substantial improvements in both the joint and bone streams.

  • The experimental results indicate that our method can enhance the accuracy performance of the model without necessitating modifications to the network structure itself.

Method

Cosine stream

Keyframe sampling algorithm

  • (0)

    Create a sequence of N+M frames using uniform sampling as the control sequence.

  • (1)

    Sort video frames based on the keyframe selection indicator. Choose frames with the smallest indicator to create a new subsequence of M frames, appending it to the N-frame subsequence from Uniform Sampling.

  • (2)

    Divide the sequence into M non-overlapping substrings. Select the frame with the smallest indicator from each substring to form a new M-frame subsequence. Connect it to the N-frame subsequence from Uniform Sampling.

  • (3)

    Building upon (1), rearrange the generated N+M frames chronologically to create a new downsampling sequence.

  • (4)

    Building upon (2), rearrange the generated N+M frames chronologically to create a new downsampling sequence.

Experiments

Datasets

Implementation details

Experiment result

Ablation study

  • Strategy (1): Simply concatenates the strongest M frames based on an indicator after the N-frame subsequences. However, this disrupts the temporal continuity of the actions since the strongest frames may not be sequential, resulting in a slight decrease in accuracy.

  • Strategy (3): After selecting the M keyframe frames, it immediately downsamples and reorders both the keyframes and N-frame subsequences in the temporal dimension to ensure continuity. This approach significantly improves accuracy.

  • Strategy (2): Demonstrates another method of keyframe selection by dividing the video into M non-overlapping temporal segments and selecting the frame with the strongest indicator from each segment to form a new subsequence. This method naturally maintains temporal continuity and has a more even distribution, leading to better performance compared to Strategy (1).

  • Strategy (4): Goes a step further by rearranging the entire M+N frame samples. This finer adjustment enhances the assistance provided to the model, resulting in further improvements.

Cross-dataset validations

Limitations

Conclusion

Supplemental Information

Raw experiments data and Dataset links.

DOI: 10.7717/peerj-cs.2523/supp-1

Additional Information and Declarations

Competing Interests

Yidan Chen is employed by Shangu Cyber Security Technology Company Limited.

Author Contributions

Chengming Liu conceived and designed the experiments, analyzed the data, performed the computation work, authored or reviewed drafts of the article, and approved the final draft.

Jiahao Guan conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Haibo Pang analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Lei Shi analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Yidan Chen analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code and data are available at GitHub and Zenodo:

- https://github.com/Jiahao-Guan/pyskl_cosine

- Jiahao Guan. (2024). Jiahao-Guan/pyskl_cosine: v0.1 (first). Zenodo. https://doi.org/10.5281/zenodo.13679682.

The FSD-10 dataset is available at GitHub: https://shenglanliu.github.io/fsd10.

The FineGym and NTU RGBD 60 datasets are available at Zenodo: Guan. (2024). gym_nturgbd60 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14049728.

Funding

This work was supported by the National Key R&D program of China (2020YFB1712401), the Nature Science Foundation of China (62006210, 62206252), the Key science and technology project of Henan province (221100211200, 221100210100), the technological research projects in Henan province (232102210090). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

281 Visitors 255 Views 8 Downloads

MIT

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more