FuMoE-csKT: Fusing disentangled and Self-Attention in a Personalized mixture-of-experts framework for Cold-Start Knowledge Tracing
Abstract
Knowledge Tracing (KT) aims to model a student’s evolving mastery of knowledge components(KCs) by leveraging their historical interaction records. However, in real-world online learning systems, a cold-start problem inevitably arises when new students enter the system with limited interaction data. Despite the emergence of numerous deep learning-based KT models, effectively addressing cold-start scenarios remains a persistent and significant challenge. In this paper, we fuse disentangled and self-attention in a personalized mixture-of-experts framework for cold-start knowledge tracing(FuMoE-csKT). At the core of our approach is a fused attention mechanism that combines disentangled attention and self-attention within a unified module, further enhanced by a learnable kernel bias function. This design significantly improves the model's capacity for sequence modeling and temporal robustness, especially in capturing multi-dimensional relations in short interaction sequences. To further promote representation diversity and personalization, the output of the attention module is passed through a soft Mixture-of-Experts (MoE) layer, where multiple expert networks are softly selected via a gating mechanism. This expert-level diversity allows the model to adaptively learn from varied student behavior patterns, even under severe data sparsity. We conduct comprehensive evaluations of FuMoE-csKT on four real-world educational datasets under cold-start conditions, leveraging two widely-used evaluation metrics, the Area Under the Curve (AUC) and Accuracy (ACC). The results showed that the proposed framework consistently outperforms existing baselines in most cases. Additionally, we perform thorough ablation studies and efficiency analyses to validate the effectiveness and robustness of our framework.