Automated music performance evaluation (AMPE) has become a crucial task in intelligent music tutoring systems, aiming to provide reliable and pedagogically meaningful assessments of student performances. However, existing methods predominantly focus on either symbolic-audio alignment or pitch/rhythm correctness, lacking the capacity to model expressive dimensions and explain evaluation outcomes in alignment with expert rubrics. To address these challenges, we propose AutoMPE, a unified multimodal framework that integrates symbolic, acoustic, and visual modalities to enable holistic and interpretable performance evaluation.
The proposed framework enhances alignment accuracy and interpretability through three tightly integrated components. First, the Cross-Attentive Alignment Module (CAAM) employs a bi-directional cross-modal attention mechanism to achieve fine-grained temporal and semantic alignment between symbolic scores and acoustic features. Second, the Hierarchical Expressive Decoder (HED) captures both short-term dynamics and long-term structural phrasing by modeling fused audio-visual input through a multi-level encoder-decoder architecture. Finally, the Knowledge-Guided Contrastive Embedding (KGCE) module constructs a rubric-aligned latent space via supervised contrastive learning, promoting consistency and transparency in scoring decisions by aligning student performances with expert-defined evaluation prototypes.
Experimental results on real-world student datasets demonstrate that AutoMPE substantially outperforms state-of-the-art baselines. In particular, the model achieves superior Alignment Accuracy (AA) and lower Expression RMSE (E-RMSE) compared to benchmark expressive models. Furthermore, blinded Turing-style evaluations with music instructors report significantly higher Interpretability Scores (IS), affirming the model’s educational utility. These results underscore AutoMPE’s ability to deliver robust, nuanced, and human-aligned evaluation across correctness, expressivity, and pedagogy, establishing a strong foundation for next-generation AI-assisted music instruction.
If you have any questions about submitting your review, please email us at [email protected].