Design and evaluation of a question-answering system based on knowledge graph-augmented large language models in K-12 artificial intelligence curriculum
Abstract
The education sector is currently undergoing a digital transformation, fostering the development of an artificial intelligence (AI)-enabled learner-centered education ecosystem. General AI education leverages large language models (LLMs) to create a new paradigm for intelligent teaching. However, LLMs often produce factual errors and insufficient explanations, which can degrade the teaching quality. Therefore, the bottleneck in knowledge credibility must be overcome, and the reasoning interpretability of LLMs in the context of professional teaching needs to be improved. To solve this problem, this study proposes a question-answering system based on knowledge graph (KG)-augmented LLMs in the K-12 AI curriculum, which enhances the answers of LLMs by injecting AI course KGs externally. The proposed system is evaluated by constructing an AI curriculum dataset containing 1098 data points in aggregate, which are categorized into three difficulty levels, using the G-Eval automatic evaluation framework combined with no-reference metrics. The system is evaluated while accessing three mainstream LLMs in five dimensions using DeepSeek-V3 as the scoring model. The results indicate that the proposed system can address the “hallucination” problem of existing LLMs in AI course knowledge questions to a certain extent, which underscores the efficacy of curriculum KGs in mitigating cognitive biases and logical inconsistencies in LLM-generated domain-specific answers. Thus, it offers a viable engineering approach towards adapting generative AI to educational contexts. We believe that this study is of great practical value for improving the quality of general AI education and contributes to the further development of AI curriculum-based education.