Detecting Depression in Urdu Texts Using Deep Learning Approaches
Abstract
The prevalence of depression in Pakistan is high, with estimates ranging from 34% to 44.4% of the population experiencing depression at some point in their lives. This is higher than the global average. The study proposes an approach for detecting depression through sentiment analysis in Urdu-language posts. The aim is to develop a model that predicts depression based on posts, addressing the lack of resources in the Urdu language for this specific task. The solution is based on a publicly available dataset containing over 60,000 labeled tweets in the Russian language, which were then converted into the Urdu language using the Google Translate API. Various machine learning, deep learning, and transformer models were applied to this dataset to evaluate the effectiveness and accuracy of depression detection. The approach has several notable merits. Namely, it is the first attempt to create a benchmark dataset for depression detection in Urdu. While many studies exist for other domains, such as hate speech and product reviews, no dedicated dataset or study currently exists for depression in Urdu. A comprehensive verification using Google Colab and open-source libraries was carried out to determine the effectiveness of the models. Key performance metrics like accuracy, precision, recall, and F1 score were used to evaluate the model’s performance on both training and test sets. From the simulation work, it was found that transformer-based models outperformed traditional machine-learning approaches, achieving higher accuracy in detecting signs of depression. The results confirm that the method can be successfully used for depression detection in a low-resource language like Urdu, filling an essential gap in sentiment analysis research. The method proposed in this study can be used to develop mental health monitoring tools for Urdu-speaking populations and pave the way for future research in language-specific sentiment analysis for mental health.