Natural language processing for Arabic poetry: A systematic review
Abstract
Poetry is a unique form of expression valued for its role in preserving cultural heritage. Fascination with poetry has sparked research interest in the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques to the analysis and generation of poetry texts. This paper presents the first systematic review of NLP- and ML-based approaches to Arabic poetry. In accordance with PRISMA, we conducted an exhaustive search across six major academic databases (ACL Anthology, IEEE Xplore, ACM, SpringerLink, Science Direct, and Google Scholar) for relevant studies published between January 2010 and May 2025. In the domain of poetry classification, we analyzed 37 studies covering nine major classification categories, including meter, authorship, poem type, genre, era, emotion, and diacritization. Ten studies on poetry generation were reviewed and synthesized. For each category, we examined the primary methodologies employed, evaluation metrics and results reported, and key challenges and opportunities. Additionally, we compiled a comprehensive overview of the available datasets, tools, and other resources that support future work in computational poetry.