Interpretable and multimodal framework for real-time affective feedback in music practice
Abstract
An affect-aware, AI-driven feedback system tailored for music education—one that meaningfully integrates real-time emotion recognition with instructionally sound intervention strategies. Traditional teaching approaches, by and large, treat performance as a purely technical endeavor, often neglecting the fluctuating emotional and cognitive states that profoundly shape how students learn, express themselves, and persist through challenges. As a result, feedback tends to be generic, delayed, and ill-equipped to address the affective barriers—such as anxiety, frustration, or disengagement—that can impede musical development. To address this gap, we present a multimodal framework that jointly analyzes physiological signals and behavioral indicators to infer learners’ emotional states during practice. Trained on a curated dataset of 100 expert-annotated practice sessions, our FT-Transformer classifier distinguishes eight emotion categories grounded in psychophysiological taxonomy with 95\% accuracy and a macro-F1 score of 0.75—a balance of precision and recall that reflects both class diversity and real-world variability. Importantly, the model’s decisions are not opaque. This alignment not only validates our feature engineering but also reinforces the physiological plausibility of the system’s inferences. In essence, this work demonstrates that effective AI in education need not choose between technical sophistication and pedagogical relevance. By tightly coupling interpretable affect recognition with discipline-specific instructional logic, we offer a transparent, scalable blueprint for intelligent tutoring systems that respond not just to what a student plays, but to how they feel while playing. Given its modular design and grounding in universal affective mechanisms, the framework shows strong potential for adaptation beyond music—particularly in other high-stakes, affect-sensitive domains such as language acquisition, public speaking, or even therapeutic skill training.