Deep learning-based extractive and abstractive summarization for the Azerbaijani language
Abstract
This paper presents a novel approach to abstractive text summarization for the Azerbaijani language using a fine-tuned mT5-base model. As one of the underrepresented languages in natural language processing, Azerbaijani lacks comprehensive resources and research in automatic summarization. We address this gap by utilizing a large-scale dataset of over 115,000 Azerbaijani texts paired with human-written summaries. The model is trained and evaluated using standard metrics such as ROUGE, BLEU, and METEOR, achieving promising results. Furthermore, performance limitations on long text inputs are analyzed, and architectural and dataset-level considerations are discussed. This research serves as a foundation for future improvements in Azerbaijani NLP applications, particularly in text summarization and information retrieval.