Multi-aspect badminton video understanding base deep learning methods
Abstract
This article presents a multi-task video analysis framework tailored for badminton matches, aiming to achieve a comprehensive understanding of semantic elements such as shuttlecock detection, player segmentation, stroke event recognition, and keypoint-based technical evaluation. Unlike existing approaches that primarily target single-task recognition in sports like table tennis or soccer, our system is designed for the unique challenges of badminton, characterized by high-speed movement, frequent occlusions, and rapid pose changes through a modular neural architecture with temporal modeling capabilities. The core architecture is adapted and extended from TTNet, integrating temporal modules and task-specific branches to enable joint training and optimization across multiple tasks. To support system training and evaluation, we construct and annotate a structured badminton dataset, VideoBadminton, which includes multi-level labels such as shuttle trajectories, player bounding boxes, stroke event types, and skeletal keypoints. Experimental results demonstrate that the proposed framework outperforms state-of-the-art baselines in terms of detection accuracy, temporal consistency, and multi-task coordination. Furthermore, we develop a visualization module to support coaching and performance evaluation. This article establishes a scalable technical paradigm for intelligent sports video analysis and provides a methodological foundation for badminton-oriented AI training and education systems.