AutoMPE: A multimodal framework with cross-attentive alignment and knowledge-guided contrastive embedding for interpretable automated music performance evaluation


Abstract

Automated music performance evaluation (AMPE) has become a crucial task in intelligent music tutoring systems, aiming to provide reliable and pedagogically meaningful assessments of student performances. However, existing methods predominantly focus on either symbolic-audio alignment or pitch/rhythm correctness, lacking the capacity to model expressive dimensions and explain evaluation outcomes in alignment with expert rubrics. To address these challenges, we propose AutoMPE, a unified multimodal framework that integrates symbolic, acoustic, and visual modalities to enable holistic and interpretable performance evaluation.

The proposed framework enhances alignment accuracy and interpretability through three tightly integrated components. First, the Cross-Attentive Alignment Module (CAAM) employs a bi-directional cross-modal attention mechanism to achieve fine-grained temporal and semantic alignment between symbolic scores and acoustic features. Second, the Hierarchical Expressive Decoder (HED) captures both short-term dynamics and long-term structural phrasing by modeling fused audio-visual input through a multi-level encoder-decoder architecture. Finally, the Knowledge-Guided Contrastive Embedding (KGCE) module constructs a rubric-aligned latent space via supervised contrastive learning, promoting consistency and transparency in scoring decisions by aligning student performances with expert-defined evaluation prototypes.

Experimental results on real-world student datasets demonstrate that AutoMPE substantially outperforms state-of-the-art baselines. In particular, the model achieves superior Alignment Accuracy (AA) and lower Expression RMSE (E-RMSE) compared to benchmark expressive models. Furthermore, blinded Turing-style evaluations with music instructors report significantly higher Interpretability Scores (IS), affirming the model’s educational utility. These results underscore AutoMPE’s ability to deliver robust, nuanced, and human-aligned evaluation across correctness, expressivity, and pedagogy, establishing a strong foundation for next-generation AI-assisted music instruction.

Ask to review this manuscript

Notes for potential reviewers

  • Volunteering is not a guarantee that you will be asked to review. There are many reasons: reviewers must be qualified, there should be no conflicts of interest, a minimum of two reviewers have already accepted an invitation, etc.
  • This is NOT OPEN peer review. The review is single-blind, and all recommendations are sent privately to the Academic Editor handling the manuscript. All reviews are published and reviewers can choose to sign their reviews.
  • What happens after volunteering? It may be a few days before you receive an invitation to review with further instructions. You will need to accept the invitation to then become an official referee for the manuscript. If you do not receive an invitation it is for one of many possible reasons as noted above.

  • PeerJ Computer Science does not judge submissions based on subjective measures such as novelty, impact or degree of advance. Effectively, reviewers are asked to comment on whether or not the submission is scientifically and technically sound and therefore deserves to join the scientific literature. Our Peer Review criteria can be found on the "Editorial Criteria" page - reviewers are specifically asked to comment on 3 broad areas: "Basic Reporting", "Experimental Design" and "Validity of the Findings".
  • Reviewers are expected to comment in a timely, professional, and constructive manner.
  • Until the article is published, reviewers must regard all information relating to the submission as strictly confidential.
  • When submitting a review, reviewers are given the option to "sign" their review (i.e. to associate their name with their comments). Otherwise, all review comments remain anonymous.
  • All reviews of published articles are published. This includes manuscript files, peer review comments, author rebuttals and revised materials.
  • Each time a decision is made by the Academic Editor, each reviewer will receive a copy of the Decision Letter (which will include the comments of all reviewers).

If you have any questions about submitting your review, please email us at [email protected].