AI framework for DRIVE model based mental health detection in text: a case study on how coping strategies are expressed during COVID-19

View article
PeerJ Computer Science

Introduction

Social networks are becoming essential for human life and have a great influence on one’s thoughts, relationships, and well-being. In contemporary societies, emergent benefits of social networks are evident; however, they can be two-sided influencers, as they also function as conduits to damaging stressors as well (Ostic et al., 2021). People share their thoughts, sentiments, and emotions prodigiously through social media posts. As a consequence, a large body of research has been conducted to explore and leverage online social network data for mental health (MH) related analysis and detection.

Individuals employ various coping strategies to manage any challenges they may encounter, particularly in times of stress. Coping strategies can be broadly categorized into positive and negative strategies, each with distinct effects on well-being (Folkman & Lazarus, 1980). Given the long and persistent engagement of individuals with their social media platforms daily, finding traces of coping strategies in response to stressful experiences embedded in their posts is not unexpected. Thus, it would be helpful to institute an automatic alert for individuals who may be negatively coping with stressors and vulnerable to mental disorders and other negative outcomes.

To understand how individuals cope during stressful situations, the Demands Resources Individual Effects (DRIVE) model offers insights into the dynamic interplay between stressors, coping responses, and mental health outcomes (Mark & Smith, 2008). The DRIVE model is a comprehensive psychological framework that integrates key concepts related to well-being, demonstrated in understanding demands, resources Individual effects, highlighting the multifaceted nature of well-being, and understanding well-being in a holistic approach.

In the area of computer science, a major body of research has culminated from text-based social media mining for psychological and MH-related problems, most notably during and after COVID-19. Much of this research are ascribed to analyzing the MH status and disorders which are mostly based on sentiment analysis (SA), also known as opinion mining or emotion detection (ED). SA is a field of natural language processing (NLP) that involves determining the sentiment tone behind a body of text. The SA analysis has been successfully applied in social media mining to sense the users’ opinions and typically categorize them into positive, negative, and neutral sentiments. Although effective and widely used, this analysis fails to capture the variety of positive and negative mental coping strategies that users share in their posts as the expressions of these mechanisms are represented within semantic contexts that are beyond the polarity of the words.

Consequently, this work aims to create a novel machine-learning framework to assess mental health status based on the DRIVE model. The proposed framework extends the SA mechanisms to detect negative and positive coping strategies and attempts to find references to the resources and demands in the text. The framework can also provide a mechanism to train an MH related language model based on the generated dataset, allowing the language model to be DRIVE-coping domain specific. Thus, the research is significant as it would allow observers and healthcare providers to set preventive measures, plans, and policies in the early stages of mental health problems. To demonstrate the effectiveness of the coping analysis and detection module of the proposed framework, the work provided experimental evidence from X (formerly known as Twitter) platform posts using statistical analysis, text mining classifiers, and topic modeling. The results have illustrated the limitation of SA compared to the coping analysis, identified the coping strategies used by individuals during COVID-19, and analyzed the topics and phrases that were used to express the coping strategies. Our study delves into the effectiveness of the DRIVE model in mental coping and the potential insights gleaned from social media mining during the pandemic, applied to a sample of posts from the State of Kuwait. The justification for this study stems from the critical need to understand and enhance mental coping strategies during crises, such as the COVID-19 pandemic, particularly within a specific cultural context.

One specific knowledge gap our research addresses is the limited exploration of the DRIVE model’s applicability in a global crisis scenario within a local multi-cultural population. The DRIVE model encompasses a holistic approach in addressing well-being, combining different essential element like stressors, resources, coping, and positive personality among others. However, its efficacy and adaptability within Kuwaiti cultural contexts during a pandemic remain unexplored in the literature.

Furthermore, the utilization of social media mining to obtain mental health insights during crises represents a novel approach. By analyzing social media data, we aim to uncover patterns, sentiments, and coping strategies prevalent among the individuals during the pandemic. This approach bridges the gap between traditional research methodologies and real-time data analytics, offering valuable insights into mental health trends and support mechanisms. Our study contributes to the existing literature by contextualizing the DRIVE model within the unique challenges posed by the COVID-19 pandemic. By exploring the intersection of mental coping strategies and social media mining within diverse cultural contexts, we seek to provide actionable insights for mental health professionals, policymakers, and individuals navigating crises.

Research questions and contribution

This study seeks to create an artificial intelligence framework that helps to understand and classify the stressors, resources, and human response and choices of coping strategies. The study demonstrates the framework partially by identifying the coping strategies that individuals are using under long stressful conditions (COVID-19 lockdown).

Specifically, the research questions and contributions can be summarized as follows:

  • 1.

    Define an artificial intelligence framework that can identify and classify individual demands, resources, and coping strategies from text posted on social media.

  • The article provides a theoretical framework to observe individual’s mental status expressed in text. The framework is designed to extract the features required to detect individual’s resources, demands, and coping strategies based on the DRIVE-model. The experiments also demonstrate the predictive reliability of the coping strategy classification component of the framework.

  • 2.

    Identify the coping strategies that are used by individuals during stressful situations.

  • The article provides a quantitative analysis and distribution of the coping strategies that were found in individuals’ posts on the platform X during COVID-19.

  • 3.

    Provide evidence that in-text coping analysis is more effective in detecting the mental status and performs better than pure SA.

  • The article analyzed the distribution of SA labels within each coping category and showed the limitation of SA to detect a number of positive and negative coping strategies. The work also provides examples of tweets with negative or neutral SA labels that are found to represent positive coping categories (and vice versa). This demonstrated that in-text coping analysis is better than pure SA.

  • 4.

    Find the set of words and topics that people have used on social media to express their coping strategies with COVID-19-related stress.

  • The article lists original examples of tweets for each coping strategy which reflect a set of expressions of their emotions and coping strategies. The word bigram analysis also provides a visual view of the most frequent word structure of each coping category.

The rest of this article is organized as follows. The following section provides a theoretical background of the DRIVE model and the text-based social media mining. Next, a threefold approach to the literature review of related previous work is provided. First, the literature that applies the DRIVE model to evaluate MH status is given. The main research that applies text-based social media mining to MH-related problems is discussed. This is followed by a review of machine-learning approaches that specifically analyze coping strategies expressed in social media posts in terms of descriptive models, statistical analysis, or qualitative data analysis. Then, the proposed framework for the DRIVE-Coping machine learning system is explained, followed by the experimental design. Experimental results and discussions are given before concluding with the main findings and listing any limitations.

Background

The conceptual approach of the study evolves from the DRIVE psychological well-being model. Thus, the DRIVE model is explained first to provide the psychological foundation of the proposed framework. Then, the main machine learning methodologies that are applied in text-based social media mining are listed and briefly explained. The section concludes with a review of the most related work in three main areas of research: the application of the DRIVE model to evaluate the MH status, the identification of the coping strategies in social media, and the use of text-based social media mining for psychological and MH-related problems.

DRIVE mental health model

Various indicators contribute to well-being, and combining these factors provides the most accurate prediction of outcomes. Well-being encompasses and is informed by diverse theories and models, such as the well-being appraisal framework, which includes multiple measures reflecting its various components (Hart & Wearing, 1995). The DRIVE model, developed by Mark & Smith (2008), offers an enhanced and flexible approach to well-being by incorporating subjective perceptions, resources, and individual differences. This conceptual framework combines previous frameworks, including elements from previous models like the Demand Control Support (DCS) model, the Effort-Reward Imbalance (ERI) model, coping behavior, attributional explanatory styles, and outcomes such as job satisfaction, depression, and anxiety (Mark & Smith, 2008).

The DRIVE model effectively predicts the effects of individual differences and work characteristics, such as coping style, on outcomes like depression, anxiety, and job satisfaction. However, there are uncertainties regarding moderating relationships (Mark & Smith, 2008; Mark & Smith, 2012a, 2012b). The DRIVE model has been supported in various contexts, including studies on the psychosocial effects on migrant workers in Italy (Capasso, Zurlo & Smith, 2016a, 2016b, 2018), UK postgraduate psychology students and nurses (Galvin et al., 2015), and university staff in the UK (Williams & Smith, 2016). Evidence for moderation effects has been limited (Galvin et al., 2015; Williams & Smith, 2016), with few interactions between predictor variables. The strength of the DRIVE model lies in its simplicity and flexibility making it a practical and multi-dimensional tool.

In particular, demands refer to external stressors, challenges, and obstacles that individuals encounter in their environment (Macía et al., 2021). These can include work-related pressures, interpersonal conflicts, health issues, financial difficulties, or major life events. Demands vary in intensity and can trigger coping responses in individuals. On the other hand, resources encompass the internal and external assets that individuals can utilize to cope with demands effectively. These include personal strengths, social support, coping skills, and environmental supports. Resources play a crucial role in buffering the impact of stressors and promoting resilience.

However, individual effects acknowledge that people vary in their coping styles, preferences, and responses to stress (Johnson & Smith, 2021). Factors such as personality traits, cultural backgrounds, past experiences, and coping strategies influence how individuals navigate and cope with stressful situations. Effects encompass the subjective perceptions, interpretations, and meaning-making processes that individuals engage in when encountering stressors (Lakey & Cronin, 2017). How individuals appraise and interpret stressors influences their coping strategies, emotional responses, and well-being outcomes.

Recent research has focused on several concentrated coping theories, including Lazarus and Folkman’s Transactional Model of Stress and Coping (Lazarus & Folkman, 1984), which emphasizes the dynamic interaction between individuals and their environment. It also explores its application to cultural influences on coping strategies (Matsumoto, 2010; Lazarus, 1993). This has contributed to the field with the Cognitive-Behavioral Coping Theory, highlighting the role of cognitive appraisal in choosing coping strategies. Research the effectiveness of cognitive-behavioral coping interventions in various populations (Hofmann et al., 2012). Folkman & Lazarus (1980) delineated between problem-focused coping (addressing stressors directly) and emotion-focused coping (managing emotional reactions), with recent investigations exploring the balance between these strategies and their impact on well-being (Zimmer-Gembeck & Skinner, 2023).

Additionally, transactional Model of Coping provided by Skinner et al. (2003) expanded on these theories by integrating cognitive appraisal, coping strategies, and outcomes into a dynamic framework, applications in understanding coping in diverse contexts (Carver & Connor-Smith, 2010). These theories provide valuable insights into how individuals navigate stressors and manage their responses, contributing to the development of effective coping interventions in psychology.

Coping strategies are categorized in various ways across the literature, reflecting different dimensions of coping behavior (Skinner et al., 2003; Carver & Connor-Smith, 2010). Problem-focused coping strategies refer to active engagement aimed at addressing the stressor and finding a solution; meanwhile, emotion-focused coping strategies aim at managing the emotional reactions to a stressor. Adaptive coping strategies lead to positive outcomes and well-being, whereas maladaptive coping strategies can exacerbate the stress-related difficulties. Mindfulness practices involve focusing on the present moment, without any judgement, which helps in increasing the self-awareness and reducing the stress reactivity (Keng, Smoski & Robins, 2011). Problem-solving refers to actively addressing the stressors and finding a solution via a systematic process of goal-setting and seeking problem-solving steps (Nezu, Nezu & D’Zurilla, 2012). Emotion regulation strategies include cognitive reappraisal (changing one’s cognition to change one’s emotions), expressive suppression (literally suppressing or controlling one’s emotions) and emotion acceptance (accepting one’s feelings and not trying to control them) (Gross, 2015). Social support is having the needed psychological and material support to help and individual to cope with stress. It plays a vital role in resilience and moderately influences coping effectiveness (Lakey & Cronin, 2017). Positive reappraisal, which refers to changing the way one looks at a stressful situation in a positive light, fosters resilience and psychological well-being and coping efficacy (task-related coping) (Tugade & Fredrickson, 2004). These coping strategies are an integral part of managing stress and one’s reaction to stressors, psychological well-being and resilience in individuals.

Text-based social media mining

The expansion of social media has led to the development of major techniques in the field of text mining (Karami et al., 2020). The main text-based techniques include: SA or opinion mining, ED, topic modeling and content analysis, event or trend detection, and human behavior analysis.

The SA methods, most notably VADER (Hutto & Gilbert, 2014), Sentiwordnet (Sebastiani & Esuli, 2006), and TextBlob (Loria, 2018), are lexicon-based NLP techniques that aim to classify the emotional value of a text expression into a multi class classification of emotions (positive, negative, and neutral). To improve its performance, the SA has been expanded by deploying supervised machine learning models and by improving its lexicon-base dictionary. The ED is an extension of the SA that is also a keyword-search based approach. It searches for emotion keywords that are assigned to psychological states according to specific psychological models such as sadness, joy, fear, disgust, anger, and so on. Examples of classical ED models and emotion lexicon dictionaries can be found in numerous studies, e.g., Roberts et al. (2012), Mohammad & Turney (2013).

Topic modeling, most notably the Latent Dirichlet Allocation (LDA) (Blei, Ng & Jordan, 2003) and the Structural Topic Model (STP) (Roberts et al., 2014), is a major application of text mining that is used to analyze the text content of social media posts. The aim is to learn the main topics of discussion by social media users (Laureate, Buntine & Linger, 2023). By analyzing the frequency of words and their co-occurrences in documents (social media posts), the topic modeling technique is able to identify the underlying themes of the set of documents. Another application of text mining in social media stems from the need to detect a topic or a theme from the text such as emerging events or trends (Afyouni, Al Aghbari & Razack, 2022), rumors and false information (Laureate, Buntine & Linger, 2023), or people’s stance detection (AlDayel & Magdy, 2021). The last related technique of text-based social media mining is the human behavior detection in which the emotional, social, and cognitive behaviors of humans are analyzed in the context of text mining (Gutierrez et al., 2021). The upcoming section provides a brief explanation of the most relevant work in which text-based social media mining techniques were applied in the context of MH.

Related work

The related work is reviewed based on three main themes. First, the conceptual foundation of applying the DRIVE model to understand the definition of the well-being within the field of psychology is given. Second, the related work of applying text-based social media mining to mental health problems is explained. The third subsection is focused on reviewing the previous work on analyzing and detecting coping strategies in social media.

DRIVE model application for the evaluation of mental health status

This section focuses on previous research that demonstrated the importance of coping strategies in influencing well-being outcomes during times of stress. Positive coping strategies have been associated with greater psychological resilience, increased subjective well-being, and better overall mental health (Folkman & Moskowitz, 2004; Carver, 1997). Moreover, the effectiveness of coping strategies can vary depending on individual characteristics, the nature of stressors, and the social context (Folkman & Moskowitz, 2004). Cultivating adaptive coping skills through interventions focused on promoting positive coping strategies can enhance individuals’ ability to manage stress effectively and improve their overall well- being (Folkman & Moskowitz, 2004; Carver, 1997).

Coping with stress during the COVID-19 quarantine has been a significant challenge for many individuals worldwide. The unique circumstances of quarantine, including social isolation, uncertainty about the future, and concerns about health and safety, have led to heightened levels of stress and anxiety for many people (Brooks et al., 2020). In response to these challenges, individuals have employed various coping strategies to manage their stress and maintain their well-being Overall, effective coping with stress during the COVID-19 quarantine involves a combination of positive coping strategies that promote resilience and well-being while avoiding negative coping strategies that may exacerbate stress and anxiety. It is essential for individuals to find coping strategies that work best for them and to seek professional help if needed to manage their mental health during this challenging time.

Text-based social media mining in MH-related applications

Text-based social media mining for psychological and MH-related problems has been a major body of research, most notably during and after COVID-19. The research can be categorized into three major themes: general emotion and sentiment analysis, MH disorder detection, and MH analysis and surveillance. This section provides an overview of each theme and its intersection with the proposed work.

The largest body of research in this field is the use of SA and ED. This genre aims to create a social-sensor of people’s opinions and feelings, particularly in the presence of stressors during economic and health crises. The work in Ali (2021), Althagafi et al. (2021), Anuratha et al. (2020), Noor et al. (2022) provide examples of applying SA to analyze users’ opinions during COVID-19. Meanwhile, the work of Larsen et al. (2015), Deshpande & Rao (2017), and Sarsam et al. (2021) are examples of ED models and emotion lexicon dictionaries that are applied to MH related problems. Although the SA and ED are effective and widely used, the analysis comes short of detecting the variety of positive and negative mental coping strategies as defined in the DRIVE mental health model.

Using text posts on social media to detect MH disorders is the second major body of research. The main MH disorders detectors that are found in the literature are: depression detectors (see De Choudhury et al., 2013; Wongaptikaseree et al., 2020; AlSagri & Ykhlef, 2020; Chiong, Budhi & Dhakal, 2021), suicide detectors (see Vioules et al., 2018; Shing et al., 2018; Rabani et al., 2023), stress, anxiety or post-traumatic stress disorder detectors (see Reece et al., 2017; Guntuku et al., 2019; Yang et al., 2022; Juhng et al., 2023), and bipolar disorder detectors (see Huang et al., 2021; Kadkhoda et al., 2022). While the techniques from these MH detector tools are essential and beneficial, they delay interventions to very late stages. By then, treatment plans are already in place for established mental illnesses. Since mental disorders can be traced and predicted from the text weeks and months prior to the appearance of the symptoms (Gutierrez et al., 2021), and given that the monitoring tool would operate in a normal mode rather than an emergency mode, it allows observers and healthcare providers to set preventive measures, plans, and policies in early stages once the demands are detected or the lack of resources is identified.

The third category of research which is the most relevant to the current work focuses on building a public mental health analytical and surveillance tool within social network platforms. By applying SA, these approaches can analyze the polarity and emotions of the social media posts to either identify emotionally harmful content, (e.g., Khasnis, Sen & Khasnis, 2021; Benrouba & Boudour, 2023); analyzing and monitoring the population’s emotions and mental health (e.g., Zhunis et al., 2022; Alavijeh et al., 2023); or for the categorization and trait description of at-risk individuals (e.g., Hinduja et al., 2022; Yang et al., 2022; Sadasivuni & Zhang, 2022).

Coping strategy analysis and detection in social media

A number of research articles have been conducted to analyze how people cope in stressful events and to link between coping and the use of social networks. From a psychological perspective, Wolfers & Utz (2022) proposed a theoretical framework that describes the relationship between social media and coping in stressful events. The framework illustrates how social media can act as three different roles in the stress-coping process: as stressors, as resources, and as coping tools.

In addition, a number of articles have created descriptive models and/or performed statistical analysis on social media posts to quantify and categorize the coping strategies in the text of each post. For example, Gaspar et al. (2016) provided one of the earliest studies that suggested the need for extending the computer-based SA to look deeper into the perception of the stressful events at the individual and society levels. Based on the coping model of Skinner et al. (2003), the authors have classified manually the coping strategies in a set of tweets that were posted on X during a food crisis in Germany. The work analyzed the sentiment of what was identified by the coders as affective expressions and categorized them into one of the twelve coping strategies depending on whether the event was perceived as a threat or as a challenge. Cmeciu & Coman (2018) analyzed a small set of tweets to identify the coping strategies, as defined in Duhachek (2005), in addition to the specific topics/issues that the tweets are linked to. Brummette & Sisco (2015) performed a similar quantitative analysis on a set of tweets in addition to the categorization of the expressed emotions to assess the public’s collective sentiment and constructed messages that helped them cope in a crisis. Although these approaches resulted in descriptive models of the users’ coping employed when facing emerging demands, this proposed work extends these previous works by creating on top of the human-based assessment a computer model that is capable of performing descriptive and predictive rolls. In addition, the DRIVE model is a holistic approach in understanding well-being demands and resources available, and provides an indication of the outcome of these factors on well-being, which differs from other coping models that were implemented in the reviewed approaches.

Several researchers have deployed machine learning to extend the SA to look deeper into the coping mechanism(s) that users experience during stressful events. El-Masri et al. (2021) studied the emotions that motivated the citizens of Qatar to cope with the stressful event of the 2017 blockade. From a predefined set of eleven emotions, the authors created a weighted probabilistic classifier that would assign a probability of the emotion for each tweet given that it contained a specific word. By analyzing the emotions over time, the article identified positive emotions that were used to undo the negative emotions and considered it a coping strategy. Although it was expressed as coping, the work was analyzing the emotions rather than the coping strategies as defined in psychological models described in Skinner et al. (2003), Duhachek (2005), or Mark & Smith (2008). Mittal et al. (2021) did a qualitative data analysis of a small set of individuals’ tweets during the COVID-19 lockdown and classified them based on predefined coping themes. The authors also performed automated and manual SA and studied the frequency of the coping themes within each sentiment-based category. An interesting and related online stress-coping detector was proposed in Weng et al. (2022) who introduced a two-phased framework to construct a stress dataset of tweets and to extract coping responses. The first phase constructed the stress dataset using stress-related hashtags, personal pronouns, and emotion recognition. In the second phase, stress-coping tweets were extracted using bootstrapping-based patterns and semantic features. The proposed framework extends this work to learn the coping strategies, the stressors, and the resources of individuals.

Drive-model mental health observatory framework

Mental health evaluation and disorder detection mostly depend on self-reported scores, clinical interviews, and physical examinations such as electroencephalogram (EEG) images. Main disadvantages of these approaches are the need for equipment and/or to be physical present (Turcan & McKeown, 2019). As a further complication, identifying the (positive or negative) coping strategies is essential during the early stages of mental disorders or even at normal states of individuals’ mental health. These conditions make it difficult to access the required data or to contact with the individuals in a timely manner. However, given the rapid increase of social media usage and the evident existence of users’ MH-related indicators in their social media posts, it is very advantageous to observe and detect coping strategies on social networks by analyzing these posted texts. This will allow early detection of MH concerns based on the type of coping strategies that are expressed in the posts, especially under stressful events or demands. Nonetheless, it is a challenging task as it depends solely on the text and its related features.

The proposed framework of coping observatory based on the DRIVE model is given in Fig. 1. The left part of the figure illustrates a conceptual model of how individual characteristics, personal resources, and demands interact to influence coping strategies. The model shows that resources and demands both influence and are influenced by coping outcomes, and that enhancing personal resources can promote positive coping in the face of challenging demands. The right part of the framework consists of four major modules. In general, the first module has three main tasks: collecting the user generated text posts, pre-processing the text, and extracting their representative features. The process of collecting texts posts is usually done by utilizing social media APIs and automated web scraping tools. The unstructured text data is preprocessed in preparation for analysis and feature engineering. This includes text cleaning, tokenization, part of speech (POS) tagging, stop-words removal, syntax parsing, in addition to stemming and lemmatization. The features can then be extracted from the preprocessed text. The features can be categorized based on the information that they represent. The content-based attributes are the main features of the framework and consist of the basic text information in addition to its linguistic characteristics and POS features. The second main set of features stems from the text sentiments. The framework incorporates the information generated by SA as part of the features, utilizing its power to detect the sensitivity and subjectivity of the text to enhance the learning model (Chiong, Budhi & Dhakal, 2021). Additional features can be extracted from the text content by identifying the context or topics from which the text was generated. The social media platforms provide two essential sets of information. The first is related to the user who has generated the post, most commonly the name, username, bio description, country, date of birth, number of followers/following, and the profile picture. The second set of features is related to the post including the post date, time, geo location and country, number of reposts/likes/quotes, and language. This data serves as input for examining individuals’ coping behaviors, enabling the development of personalized intervention plans and policy adjustments aimed at improving overall mental health outcomes.

The proposed framework of coping strategy observatory and detection based on the DRIVE model.

Figure 1: The proposed framework of coping strategy observatory and detection based on the DRIVE model.

The components that are illustrated in bold has been implemented in this study. Source Credit: Database, Heart and Building Icons, Microsoft.

The second module of the framework is designed to learn the coping strategies, demands or stressors, and resources that are expressed in the posts using the feature representations of Module (1). Based on the DRIVE model, the coping strategies are classified into two classes (positive and negative coping) which can be translated into a two-class classification problem. The coding also requires identifying the specific type of coping strategy under each coping class. The learning process requires an expert teacher to guide the automated detection techniques. This is mainly done by a human expert who codes a sample of texts to identify the coping strategy and its type in each post. As a further coding processing stage, semi-supervised learning can be achieved from utilizing unlabeled data to enhance the model that was constructed from the small set of labeled data. Using task-specific unlabeled data fine tunes the model to the specific domain/task which is proven to enhance the performance (Shi et al., 2023). Since the proposed observatory system handles a huge number of social media posts, the need for semi-supervised learning techniques is essential to guarantee the scale and efficiency of the learning model.

The second main task in the coping detection module is feature selection and extraction. Data mining techniques are used for selecting or extracting the most relevant and informative features to build a more accurate and efficient learning model. A machine learning model is then selected to be trained on the data in order to induce the prediction function from the data. Content analysis and topic modeling of the classified text posts can be utilized to identify the main DRIVE-coping-related themes, keywords, and the main resources and demands that the users are experiencing. The determined themes and keywords can be used to generate coping-word ontology and resources/demands related datasets. The datasets can then be utilized in a lexicon-based NLP or to fine-tune mental health specific language models which is the third module of the proposed framework.

Recent advances in generative AI have resulted in a number of promising MH-related pretrained learning models (PLM) for classification and counseling purposes (Ji et al., 2023). Examples of MH-related PLMs include PsycBERT (Vajre et al., 2021), BHS-BERT (Naseem et al., 2022), and MentalBERT (Ji et al., 2022). The DRIVE-coping datasets that are generated in Module (2) can be used to pre-train a transformer within a coping language model to make it adaptive to this particular domain. When the model is set to a deployment mode, a monitoring dashboard can be created for healthcare providers and policy makers. The dashboard can be used for visualization and surveillance purposes to identify emerging or recurring stressors and to evaluate an individual’s available resources. Module (4) of the framework utilizes the model repository and learning models as an information retrieval system. Given an input such as keywords, a prompt, a search criterion, a user account, and/or a social media post or a hashtag, the learning model assesses the current resources and demands and generates the predicted type, category, and level of coping of the individual. Furthermore, by including the users’ timelines, the language model can generate, for example, a prediction of the expected mental illnesses and the vulnerable individuals based on the expressed coping type and on the identified resources and emerging demands.

However, critical challenges across the framework operations in general as well as during its online deployment stemming from privacy concerns occurred. Given the sensitivity of the data and the MH-related predictions and applications, the safety, security, reliability and consistency of the data are essential to safeguard the users and ensure a reliable application of the information (Ali et al., 2023; Hua et al., 2024). The framework must be equipped with privacy-preserving techniques including data encryption (Li & Li, 2015), differential privacy (Ficek et al., 2021), and multiparty computation (Reich et al., 2019).

Methods

As proof of concept, the framework was tested to demonstrate its ability to detect the coping mechanisms people used while experiencing the stressful conditions of COVID-19. The components of the model that has been implemented in this study are illustrated in bold in Fig. 1. This includes: data collection, pre-processing, feature creation, human expert coding, feature selection/extraction, and coping detection. The rest of the DRIVE-Coping model components are left for future work. Ethical approval for this research was obtained from the Ethics Research Committee at Kuwait University and the assigned code is KU-CLS-20-06-23.

The data collection was done using X application programming interface (API). The collected data was filtered using COVID-19-related keywords such as “coronavirus,” “COVID-19,” “corona,” “omicron,” “vaccine,” and “quarantine,” among others. The geographic location of the tweets was set to the State of Kuwait, and the time frame was from November 1, 2019 to mid-May 2022. A total of 85,634 tweets were collected, comprised of 45 languages including 6,397 posts in English. News accounts were removed as the focus was on individuals’ opinions and perceptions. The data included several features related to the individual who posted the tweet such as the number of following, the number of followers, and bio description. A set of features that are related to the tweet itself, such as reply count, like count, retweet count, date, time, and geographical locations were used. More features were extracted from the tweets’ text, specifically the word count and Term Frequency Inverse Document Frequency (TF-IDF) (Luhn, 1957; Sparck Jones, 1972).

A sample of 1,000 tweets was selected randomly out of the 6,397 English tweets to be annotated by the team. Each tweet was labeled “positive coping” or “negative coping”, in addition to the subcategories describing the particular positive or negative coping strategy. Based on the coping theory, the positive coping strategies used for the subcategories were: “problem-solving”, “seeking social support”, “positive reframing”, “engaging in relaxation techniques”, and “mindfulness practices”. Negative coping strategies that were adopted for the experiment were: “avoidance”, “denial”, “wishful thinking”, “self-blame”, and “venting of emotions”.

Statistical analysis was done to explore the dataset. This included the distribution of the classes and their sub-categories, a time series analysis of the class distribution over time, and a comparison of the proposed classes against the SA classes. The comparison was supported with examples of tweets to show the strength of the proposed coping-based classification over the SA classification. The LDA method was also performed for each of the positive and negative coping subsets independently to further demonstrate the main topics and understand the context.

The data pre-processing was performed using the Natural Language Toolkit (NLTK) platform (Loper & Bird, 2002). First, the emotion symbols and emojis were handled by replacing them with proper text. This was followed by the removal of the stop-words as well as the hashtags, mentions, and URLs. This enabled the data to be input into NLP processing model. The SA was done using two techniques: VADER and TextBlob.

To reduce the number of features and utilize the most informative ones, feature engineering was applied to select and extract the best set of features. Correlation analysis was performed to identify the features that correlated most with the coping classes. A description of the features most related to the target class is shown in Table 1.

Table 1:
A brief description of the main features selected after correlation analysis.
Feature Description
Author Related
Following count It refers to the number of other Twitter accounts that a particular user is following.
Post Related
Like count It represents the number of times a tweet has been liked by other users. Like count can provide valuable insight into the sentiment and emotional impact of tweets.
Quote count It indicates the number of times a tweet has been quoted by other users. The quote count can be seen as an indicator of the tweet’s impact on prompting further engagement and discussion among users. It reflects the level of interest and involvement the tweet has generated within the Twitter community.
Reply count It refers to the number of times a tweet has received direct responses or replies from other users. Higher reply counts suggest that the tweet has stimulated discussion, elicited responses, and encouraged interaction among users. It reflects the tweet’s ability to capture attention, provoke thoughts, or evoke emotions that prompt others to engage in direct dialogue.
Retweet count It represents the number of times a tweet has been shared or retweeted by other users. Each retweet represents a form of endorsement or agreement with the original tweet’s content, as users choose to share it with their own followers.
Hour It indicates the hour of the day when the tweet was posted. This feature can offer valuable insights into users’ behavioral patterns and emotional states at different times of the day.
Sentiment Analysis (SA) Related
TextBlob polarity The polarity score is a numerical value ranging from −1 to 1, where −1 signifies negative sentiment and +1 denotes positive sentiment.
TextBlob subjectivity The subjectivity score, ranging from 0 to 1, indicates the level of personal opinion within a text. A higher subjectivity score, nearing 1, suggests that the text is predominantly composed of personal opinions rather than factual information.
VADER pos This score represents the likelihood of the sentiment being positive.
VADER neu This score represents the likelihood of the sentiment being neutral.
VADER neg This score represents the likelihood of the sentiment being negative.
VADER compound The compound score is derived by normalizing the three VADER (neg, neu, and pos) scores. It ranges from −1 (indicating extremely negative sentiment) to +1 (representing highly positive sentiment), reflecting the overall sentiment intensity.
Natural Language Processing (NLP) Related
TF-IDF It is a metric that evaluates the importance of a term in a specific tweet compared to its prevalence across a collection of tweets.
Word count The number of words present in the tweet. This count is computed by splitting the tweet text into individual words and counting the total number of words. It provides a simple measure of the length or complexity of the tweet’s content.
DOI: 10.7717/peerj-cs.2828/table-1

The XGBoost tree boosting system (Chen & Guestrin, 2016) was used to prioritize the features which would help eliminating less important features. Figure 2 lists the features in descending order by their gain-based feature importance (or the F-score) that was used in XGBoost to evaluate the feature’s importance. It quantified the improvement in the model accuracy achieved by splitting a particular feature, which made it a reliable measure of its importance in prediction. Higher gain values indicate features that contributes more to the predictive power of the model. According to XGBoost, the five features that had the highest importance scores were: the “following count”, “TextBlob subjectivity” score, “VADER compound” score, “TextBlob polarity” score, and the hour of the post.

The dataset features ordered by their importance based on XGBoost F-score.

Figure 2: The dataset features ordered by their importance based on XGBoost F-score.

These features, in addition to the text related features, were selected for training and testing the coping prediction machine learning models. Different combinations of these features were tested to select the set that resulted in the best prediction. Table 2 shows the different combinations of the most important features that were used for the training and testing, starting with pure one text feature (TF-IDF) and adding one additional feature at a time. Two additional feature sets were used. The first included the user and post related features in addition to the text. The second included the text and the SA related features.

Table 2:
Combinations of the high gain features used for training and testing machine learning models.
Feature I II III IV V VI VII VIII IX
TF-IDF
Word count
Following count
TextBlob subjectivity
VADER compound
TextBlob polarity
Hour
DOI: 10.7717/peerj-cs.2828/table-2

To prepare the features for machine learning models, categorical variables such as the “following count” and “like count” were encoded into numerical representations. Similarly, scaling was applied to the numerical features, such as the “TextBlob subjectivity” and “VADER compound,” using the min-max scaling technique. The dataset was used to train a variety of classical and state-of-the-art machine learning classifiers to build a coping detection system. The classifiers that were tested include a rule-based method (decision tree), regression method (logistic regression), instance based method (K-nearest neighbor), statistical based method (Naïve Bayes), support vector machine (SVM), and two ensemble methods which are random forest and XGboost. A summary of the classification models and their description can be found in Table 3.

Table 3:
Summary of the machine learning models used for coping strategy classification.
Machine learning model Description
Decision Tree A decision tree classifies data by iteratively splitting it based on specific feature values, resulting in a tree-like structure where each terminal node represents a class (Breiman et al., 1984).
K-Nearest Neighbor (KNN) KNN is a non-parametric method that classifies new data points based on their proximity to existing labeled examples in the dataset (Cover & Hart, 1967).
Logistic regression Logistic regression models the probability of a binary outcome by establishing a relationship between the input features and a binary response variable (Cox, 1958).
Naïve Bayes Naïve Bayes is a probabilistic classifier that applies Bayes’ theorem under the assumption that each feature contributes independently to the outcome (Langley, Iba & Thompson, 1992).
Random forest Random Forest is an ensemble method that constructs multiple decision trees and aggregates their predictions to arrive at a final classification (Breiman, 2001).
Support Vector Machine (SVM) SVM is a classification algorithm that identifies the optimal hyperplane that maximally separates data points from different classes (Cortes & Vapnik, 1995). SVM looks at data and tries to find the best boundary between groups, like those using positive or negative coping strategies. It finds the line that best separates the two types of behavior.
XGBoost XGBoost is a scalable, efficient implementation of gradient boosting that constructs decision trees sequentially, with each tree correcting errors from the previous ones (Chen & Guestrin, 2016).
DOI: 10.7717/peerj-cs.2828/table-3

Results

The results are divided according to the analysis of the research questions. First, the feasibility of the framework, specifically the coping detection, is proven via a number of machine learning algorithms. Second, the coping strategies used by individuals during COVID-19 are analyzed. Third, qualitative and quantitative comparisons between coping analysis and SA are made to demonstrate the limitation of SA in expressing the status of MH and the coping outcomes. Lastly, bigram analysis and LDA modeling are employed to examine the vocabulary and topics utilized by individuals to express their coping strategies.

Framework evaluation

The feasibility of the framework was studied by developing and evaluating the proposed predictive coping classification model. The objective of the model is to take user posts as input features and to predict the type of the coping strategy (positive or negative). To train the model, the data was divided into 80% and 20% for training and testing, respectively. Each combination of features in Table 2 was fitted into one pipeline and used as input for the classifier models. Table 4 shows the prediction accuracy results averaged over the classifier model (row) and over the feature set (column). On average, Feature Set II, which included the text features only, had the greatest accuracy, followed by Feature Sets IV, V, and VI which included the text features, the user’s number of following, and different subsets of SA features. K-nearest neighbor performed better than the other classifiers followed by the naïve Bayes. The greatest accuracy was achieved by K-nearest neighbor in Feature Set V. Since the objective was to demonstrate the effectiveness of the coping framework, future work is left to explore other feature engineering approaches, learning models, and the use of LDA topics or other word embedding techniques to enhance model performance.

Table 4:
Machine learning models accuracy scores comparison with averages.
The highest accuracy and the highest averaged accuracies are highlighted in bold.
Machine learning model I II III IV V VI VII VIII IX Avg
Decision tree 71 73 73 73 70 69 67 73 69 70.89
K-Nearest Neighbor 73 77 77 79 82 81 77 75 78 77.67
Logistic regression 76 79 75 75 75 76 75 73 76 75.56
Naïve bayes 75 79 75 78 78 77 77 76 78 77.00
Random forest 71 76 77 76 76 76 75 76 75 75.33
Support vector machine 75 79 74 74 77 75 74 73 74 75.00
XGBoost 69 73 74 73 70 73 72 75 70 72.11
Avg 72.86 76.57 75.00 75.43 75.43 75.29 73.86 74.43 74.29
DOI: 10.7717/peerj-cs.2828/table-4

To further investigate performance, the dataset with the feature subset No. II in Table 2 was selected to demonstrate the effectiveness of the classifier, given that it resulted in the best accuracy on average among all datasets. A summary of the performance results for this case is given in Table 5. For each model, the weighted averages of accuracy, precision, recall, and F-1 score are listed. Among the models, logistic regression, naïve Bayes, and SVM had consistently higher accuracy scores of 79%. Naïve Bayes and SVM stand out for the F-1 score, achieving 77% on average. This highlighted their balance between precision and recall. Conversely, decision tree and logistic regression exhibited the lowest F-1 scores with accuracy percentages of 67% and 69% respectively.

Table 5:
Summary of the classification performance for dataset II.
Machine learning model Accuracy Precision Recall F-1 Score
Decision tree 73 68 73 67
K-Nearest neighbor 77 75 77 73
Logistic regression 79 75 67 69
Naïve bayes 79 77 79 77
Random forest 76 77 76 70
Support vector machine 79 78 79 77
XGBoost 73 70 73 71
DOI: 10.7717/peerj-cs.2828/table-5

Table 6 shows a comparison of the two models that had the highest accuracy scores among all datasets. These were the K-nearest neighbor trained on the dataset with the feature subset V and the naïve Bayes trained on the dataset with the feature subset II. The confusion matrices of the models are shown in Fig. 3. The class “True” corresponds to the negative coping class. For the K-nearest neighbor, there were 128 true negatives (TN) and 33 true positives (TP). The precision for the “False” class (i.e., positive coping) was 86%, denoting a high proportion of correctly identified positive coping strategies used by individuals among all instances that were classified as positive coping. The recall was slightly higher with 89% for the recalled positive coping strategies. However, for the “True” class, i.e., negative coping, the precision was lower, at 67%, while the recall was 62%.

Table 6:
A comparison of the classification performance metrics of the K-nearest neighbor model on Dataset V and the naïve Bayes model on Dataset II.
K-Nearest neighbor of dataset V Naïve Bayes of dataset II
Accuracy score is 0.82 Accuracy score is 0.79
Precision Recall F1-score Support Precision Recall F1-score Support
0 0.86 0.89 0.88 144 0 0.81 0.93 0.86 144
1 0.67 0.62 0.65 53 1 0.68 0.40 0.50 53
Accuracy 0.82 197 Accuracy 0.79 197
Macro avg 0.77 0.76 0.76 197 Macro avg 0.74 0.66 0.68 197
Weighted avg 0.81 0.82 0.81 197 Weighted avg 0.77 0.79 0.77 197
DOI: 10.7717/peerj-cs.2828/table-6
The confusion matrix of (A) the K-nearest neighbor model of Dataset V and (B) the naïve Bayes model of Dataset II.

Figure 3: The confusion matrix of (A) the K-nearest neighbor model of Dataset V and (B) the naïve Bayes model of Dataset II.

For the naïve Bayes model trained on dataset II, the accuracy was 79%, which was slightly lower than the K-nearest neighbor model trained on Dataset V. The precision and recall for identifying instances of the positive coping were 81% and 93%, respectively. However, the precision and recall measures were lower for the negative coping instances, reaching 68% and 40% respectively.

The receiver operating characteristic (ROC) curve plots the true positive rate against the false positive rate at various threshold settings. The closer the ROC curve is to the upper left corner of the graph, the better performance of the model, which is represented as the area under the curve (AUC). An AUC greater than 0.5 implies that the classifier has a better ability to distinguish between classes than a random guess, demonstrating acceptable predictive reliability (Fawcett, 2006). Figure 4 illustrates the ROC curves of the K-nearest neighbor model for Dataset V (left) and the naïve Bayes model of Dataset II (right). By approximating the AUC, the KNN model achieves a score of approximately 0.7, whereas the naïve Bayes model yields an AUC of around 0.6. The performance of both the K-nearest neighbor and the naïve Bayes demonstrate acceptable predictive reliability as both of the curves were above the 0.5 threshold line of accidental methods. The K-nearest neighbor model has predicted the coping category better than the naïve Bayes.

The ROC curve of (A) the K-nearest neighbor model of Dataset V and (B) the naïve Bayes model of Dataset II.

Figure 4: The ROC curve of (A) the K-nearest neighbor model of Dataset V and (B) the naïve Bayes model of Dataset II.

Identifying coping strategies

Examining the labeling of the random sample set revealed a majority of positive coping tweets, making nearly 70% of the sample. Examples of posts that represented positive and negative coping strategies based on the manual labeling by the team are given in Tables 7 and 8 respectively. Any user or personal name, brand name, company name, commercial channel or show are shown as XYZ or ellipsis (…) to maintain anonymity and ensure impartiality. The text of the tweets is kept in its original form except for a few spelling errors. As the users are mostly non-native English speakers, there are some structural and grammatical issues.

Table 7:
Examples of tweets for each positive coping category.
Category Tweet
Mindfulness practices - May God grant you relief soon. Today is a day of stone
- @XYZ* Hey (…)*, you must not have clicked on the website. It’s called Coronavirus Disease 19 (COVID-19) you!.
- Even with the Coronavirus panic buying, nobody wants to eat Vegan food.
- I wonder what Corona (the 90s Band) are doing now.
- Wishing everyone in Kuwait a Happy National Day! In spite of the cancellation of all public events due to caution about corona virus, we celebrate 59 yrs of independence and tomorrow is 29 yrs since liberation from the 1990 invasion. I’m very grateful
- May God protect you and fulfill your dreams, (…)*
Engaging in relaxation techniques - Binge watching X in quarantine is the best thing I decided to do
- My last quarantine day, let the trips begin tomorrow
- The best farbuccino #coffee #XYZ* Cafe
- Our meeting tomorrow with the distinguished Dr. (…)*, Professor of Media and Public Relations at (…)* University. Our meeting entitled Rumors in the time of the new Corona crisis: How to confront
- Saturdays are for new nails and a hair mask
Positive reframing - 2.5 years and a pandemic later, I’m finally home
- Very positive initiatives. May God heal you all
- Let me sleep and when I am awake my quarantine ends
- @XYZ* I wish you a safe trip and have a good time. Corona virus changed our lives, erased all our programming and became trapped inside our bodies. steal your happiness and enjoy; life is too short.
- Let’s keep (…)* School COVID-19 Free!
Problem-solving - Some factors that would help you stay stronger during crisis are being able to make decisions quickly and having a well-tested crisis plan in place.
- V6. If you start developing shortness of breath or difficulty breathing, please visit your nearest hospital. Don’t waste time not getting any medical attention because you’re scared of corona virus exposure in hospitals. All MOH hospitals are taking precautionary measures.
- @XYZ* No, not necessary if there was proper PPE use and sanitization (my opinion, not looked into research yet). Again I am assuming competency and proper planning for this unfolding situation.
- The new mutant, Omicron, has arrived in Saudi Arabia. A case has been discovered, and this means the danger is approaching Kuwait. Yesterday, cases of infection with Corona increased, so caution must be exercised at land, sea and air ports, and adherence to health requirements, social distancing, wearing masks, and vaccinating the unvaccinated.
Seeking social help - China confirms the transmission of the new Corona virus to humans and infecting more citizens. Therefore, we call on the Arab health ministers to pay attention to the development of the dangerous and contagious Corona virus in China. There is a patient who caused the infection of an entire medical team consisting of one person. Monitor travelers to your countries carefully and examine them for the Corona virus.
- @XYZ* Hello What about Residents returning to UAE ?? If Booster Vaccination is done - are they allowed ?
- @XYZ* Sir can you please take up the issue of (X)* expats stuck in (…)* getting (…)* vaccine with Kuwait MoH. It looks its not clear if (…)* (vaccine) is accepted by Kuwait MoH despite it being same as (…)* vaccine.
DOI: 10.7717/peerj-cs.2828/table-7

Note:

*Brand, company, commercial shows, and personal names have been replaced by (…) to maintain anonymity and ensure impartiality.

Table 8:
Examples of tweets for each negative coping category.
Category Tweet
Venting of emotions - Corona sucks
- In my opinion what’s worse than coronavirus…are all those mentally ill people judgmental, hypocrites, ungrateful and so on. In our society because they don’t have a cure and their numbers keep growing.
- Sometimes you just hate idea of moving on after pandemic. Face masks genuinely feel like they’re slowly pealing my ears off
- 2 weeks ago since I tested positive for Covid and I still feel absolutely exhausted most evenings! I barely managed 20mins at the gym without feeling totally empty tonight
- Covid-19 probated a Fall Line: Omicron
- I genuinely hate the fact that because of Covid I never got to have either of my two graduations.
Avoidance - I swear to God, That is over #MinistryofHealth
- Enough with the corona talk
- If ONE MORE PERSON SENDS A CORONA VIRUS WARNING ON MY DADS WHATSAPP IM GONNA LOOSE IT
- Corona has me wanting to travel
- Screw COVID-19 he was supposed to be in Kuwait w weeks ago
Denial - Take the mask off when you speak to me
- I don’t mean to b rude, but is this Corona virus a biological weapon developed by the Pentagon?
- The pharmaceutical company that created #coronavirus I think you made your point. Its time to sell the antidote
- @(…)* Go to Paris because they’re the only country who have Therapy and cure the people from corona
- I think This coronavirus has already existed long ago and people have already been exposed to this virus, it’s just that they’re making it a bomb today.
Would you still take the Corona Vaccine after watching the ending to this Corona Movie Trailer?
Wishful thinking - My life before the pandemic. #cogne #italy #mountaineering #alpine #climbing #france #chamonix
- Good morning Everyone after long time we are feeling boring How or when Corona will finished
- Sometimes I wish I could unsee the things I’ve seen in Covid wards
- A gateway in an island with limited people, gated community, no visitors, commodities delivered to a sanitized room, and good company of friends and family for 6 months. #Coronavirus
- First #coronavirus then this crap? #Yaravirus… Can someone find me another planet?
Self-blame - I wonder if my current brain fog is caused by delayed concussion, my period, or covid infection that i don’t know of. smh
- Return to God, the Corona epidemic and others will go away
- And We do not send signs to frighten you, and the soldiers of your Lord do not know, nor are they a reminder for human beings. Storms and floods, severe heat, diseases carried by animals and winds that come. Man knows and does not know. Viruses that are seen but not seen. Pandemic after pandemic and diseases that were in our ancestors through to the possessors of understanding. And for the oppressors to repent, and for the tyrants to crush them.
DOI: 10.7717/peerj-cs.2828/table-8

Note:

*Brand, company, commercial shows, and personal names have been replaced by (…) to maintain anonymity and ensure impartiality.

The distribution of the coping categories is given in Fig. 5. It can be seen that the most used coping strategy is the positive “mindfulness practices” strategy, followed by “positive reframing” and “engaging in relaxation techniques”. The negative coping strategies (“venting of emotions”, “avoidance”, and “denial”) come next in frequency. The strategies that people used the least were “self-blame” and “wishful thinking” (negative coping) followed by “seeking social support” and “problem-solving” (positive coping).

The distribution of coping strategies in order of frequency.

Figure 5: The distribution of coping strategies in order of frequency.

The main techniques that are linked to the positive strategy labeled “mindfulness practices” were reciting prayers, meditation practices, expressing gratitude and blessings, and being sarcastic and humorous. The “positive reframing” strategy included practices related to work adaptation, spreading positive news, and having a positive perception of the events. Similarly, the “engaging in relaxation techniques” strategy involved self-care activities like spa and body treatments and participating in sports or entertainment activities. The negative “venting of emotions” strategy was connected to the spread of negative news or thoughts, while the expressions of hopelessness were considered part of the negative “avoidance” strategy. Conspiracy theory talk was mainly connected to the negative coping strategy labeled “denial”. In parallel, any wishing for how it was in the past (future) or for impossible activities or events were linked to the negative strategy labeled “wishful thinking”. Any self-blaming statements that considered the pandemic a punishment was labeled as negative “self-blame” coping mechanism.

Comparison between in-text coping analysis and SA

Comparing the coping mechanisms with the SA scores, Fig. 6 illustrates the frequency distribution of the VADER (left) and TextBlob (right) labels grouped by positive and negative coping categories. Assuming that positive coping and positive sentiment analysis represent a good mental health outcome (and vice versa), then the SAs that have more positive labels under the positive coping category would be a better indicator of coping. The figures show that VADER is more aligned with the coping categories than TextBlob, with 400 tweets labeled positive which are considered positive coping compared to nearly 340 tweets for TextBlob. Approximately 150 tweets which represented negative coping were labeled negative using VADER sentiment analysis, compared to 100 negative coping tweets that were labeled negative by the TextBlob sentiment analysis. There are a lower number of neutral-labeled tweets under VADER than under TextBlob. Nonetheless, both SA methods could not recognize a number of the expressed coping strategies, labeling them positive when they were negative coping and vice versa. Examples of tweets for which the VADER and/or TextBlob categories have miscategorized their coping category are listed in Tables 9 and 10 for the positive coping strategies and for the negative coping strategies, respectively. The examples provide evidence of how some coping strategies that involve the use of negative terminology, given their context, could confuse the SA analysis.

The distribution of SA labels within each coping category: VADER (A) and TextBlob (B).

Figure 6: The distribution of SA labels within each coping category: VADER (A) and TextBlob (B).

Table 9:
Examples of tweets with positive coping expressions and negative or neutral SA.
Tweet Coping category VADER Category TextBlob category
- Days without deaths, thank God Positive reframing Negative Neutral
- Oh God, make the days pass by, they hurt, they give, they take, they rejoice, they grieve, the Most Merciful of the Merciful, Corona Mindfulness practices Negative Positive
- Saturdays are for new nails and a hair mask Engaging in relaxation techniques Neutral Positive
- 2. Wear a mask if you work in hospitals or are dealing with sick individuals, anyone that’s coughing or complaining of flu like symptoms. Positive reframing Negative Negative
- my last quarantine day, let the trips begin tomorrow Engaging in relaxation techniques Negative Neutral
- Negative PCR; out of quarantine. Time to enjoy what’s left of the weekend before term starts Sunday. Engaging in relaxation techniques Negative Positive
- #corona symptoms-One of em is taste loss. How can one verify that it’s a corona symptom while eating wife cooked meal? Engaging in relaxation techniques Negative Neutral
- Oftentimes, I need long space alone with my self to know who am I become. #aoukw #aouthoughts Engaging in relaxation techniques Negative Negative
- Panic and fear, whether Corona or something else. God said, “Say, God’s decrees, our Master, will afflict us. And in God, let the believers put our trust.” Engaging in relaxation techniques Negative Neutral
- Tell me your attitude towards the Chinese people nowadays and I’ll tell you who you are. Seriously stop the hate. Let’s beat Corona Virus together. #PrayForChina. #WagPraning Positive reframing Negative Negative
- #socialmedia bad impact is spreading wrong news that might at many times cause fear and #frustration therefore always make sure to check the source of #information Problem-solving Negative Negative
DOI: 10.7717/peerj-cs.2828/table-9
Table 10:
Examples of tweets with negative coping expressions and negative or neutral SA.
Tweet Coping category VADER category TextBlob category
- Next week we will complete 4 months and nothing has been done Avoidance Neutral Positive
- Back to quarantine Venting of emotions Neutral Positive
- I suspect more COVID wards will open in hospitals all over Kuwait after this weekend Avoidance Neutral Positive
- Discrimination is the new norm #Covid19 Venting of emotions Neutral Positive
- Reminiscing on the past when touching outdoor items did not require sanitizing my life away Wishful thinking Neutral Positive
- I miss running I really wanna go out for a run. I miss running in @Alshaheedpark Wishful thinking Negative Positive
- It’s pouring down, I’m in quarantine and there is no power in my building. Happy Sunday folks! #sundayvibes Avoidance Positive Positive
- The pharmaceutical company that created #coronavirus I think you made your point. Its time to sell the antidote Denial Positive Neutral
- Sometimes I wish I could unsee the things I’ve seen in Covid wards Wishful thinking Positive Neutral
- A gateway in an island with limited people, gated community, no visitors, commodities delivered to a sanitized room, and good company of friends and family for 6 months.. #Coronavirus Wishful thinking Positive Positive
- Please don’t say this is true… f@@@ Corona Wishful thinking Positive Positive
- I wonder if my current brain fog is caused by delayed concussion, my period, or covid infection that i don’t know of. smh Self-blame Neutral Neutral
DOI: 10.7717/peerj-cs.2828/table-10

To look deeper into the performance of the SA within each coping strategy, a drill-down examination of the SA categories was done to identify the tweets for which the coping type and the SA category disagreed. Table 11 provides the number and percentage of tweets that the TextBlob or VADER classified as a negative or neutral sentiment while they were a positive coping and vice versa. A visual chart of the miscategorizaed statistics is given in Fig. 7. The VADER SA had, on average, a lower average miscategorized percentage for both positive and negative coping strategies than the TextBlob SA. The highest miscategorized percentage for TextBlob was found in the “wishful thinking” strategy (80%) and the lowest (37.31%) for VADER was found in the “positive reframing” strategy. The TextBlob had, on average, more miscategorization of the coping strategies than VADER did, with most of the disagreements stemming from the larger neutral class. Both SA methods had the largest miscategorization in the negative coping which exceeded 50% of the negative data. This use of negation, subjectivity, and sarcastic expressions may have influenced the SA performance.

Table 11:
Statistics of tweets with miscategorized positive coping strategies with negative or neutral SA and negative coping strategies with positive or neutral SA.
Coping strategy No. of miscategorized tweets by TextBlob No. of miscategorized tweets by VADER Total tweets of category TextBlob miscategorized percent VADER miscategorized percent
Positive Coping Strategy
Mindfulness practices 141 104 265 53.21% 39.25%
Positive reframing 83 72 193 43.1% 37.31%
Engaging in relaxation techniques 66 60 123 53.66% 48.78%
Problem-solving 32 36 69 46.38% 52.17%
Seeking social help 25 18 40 62.50% 45.00%
Total percent 50.29% 42.03% average 51.75% 44.5%
Negative Coping Strategy
Venting of emotions 56 50 100 56% 50%
Avoidance 45 39 75 60% 52%
Denial 59 35 74 79.73% 47.3%
Wishful thinking 24 20 30 80% 66.67%
Self-blame 8 6 14 57.14% 42.86%
Total percent 65.53% 51.19% average 66.57% 51.76%
DOI: 10.7717/peerj-cs.2828/table-11
The percentage of positive coping tweets with negative or neutral SA or negative coping tweets with positive or neutral SA per coping strategy based on TextBlob (light) and VADER (dark).

Figure 7: The percentage of positive coping tweets with negative or neutral SA or negative coping tweets with positive or neutral SA per coping strategy based on TextBlob (light) and VADER (dark).

Linking words and text to coping strategies

An analysis of co-occurring words using bigrams NLP models was conducted to gain insights into the relationships between words in the positive coping and negative coping text datasets independently. Figure 8 shows a network of the top 50 occurring bigrams in the positive coping text corpus. The bigram “oh god” appeared in the top ten bigrams in the positive coping data which was related to the mindfulness practice of praying. The “wear mask” and “covid vaccine” bigrams, which are specific to COVID-19-related problem-solving techniques, also appeared in the top ten positive bigrams. Figure 8 shows more bigrams that can be linked to positive coping strategies. Examples included: “stay safe,” “social distancing,” “oh bless,” “return normal,” “live long,” and “thank god.”

Network diagram of the top 50 occurring bigrams in the positive coping dataset.

Figure 8: Network diagram of the top 50 occurring bigrams in the positive coping dataset.

Figure 9 shows the top 50 occurring bigrams with respect to the negative coping text corpus. The bigram “swear god” appeared as the top bigram in the negative coping category. The bigrams “must solved,” “month isolation,” “leave family,” and “isolation must” also appeared in the top 12 bigrams in the negative coping category. Other bigrams included: “biological war,” “god deconfinement,” “god forbidden,” “go mall,” “swear suffocating,” and “mentally ill.

Network diagram of the top 50 occurring bigrams in the negative coping dataset.

Figure 9: Network diagram of the top 50 occurring bigrams in the negative coping dataset.

What was also interesting is the emojis within each coping category. Since social media users depend heavily on emojis for expressing their feelings, the bigrams that defined the emojis were prevalent for both the positive and negative categories. The list of the top bigrams in the positive coping dataset included the “tear joy,” “folded hand,” “red heart,” “check mark,” “smiling eye,” “victory, hand,” and “rolling laughing” emojis. The list of the top bigrams in the negative coping dataset included the “broken heart,” “loudly cry,” and “double exclamation” emojis. Figure 10 lists graphically the top Emojis used in the positive and negative coping categories based on the bigram analysis. Moreover, the analysis reveals that positive coping tweets contain a wider variety of emojis compared to the negative coping tweets.

The top emojis used in the positive and negative coping categories based on bigram analysis.

Figure 10: The top emojis used in the positive and negative coping categories based on bigram analysis.

In addition, LDA analysis was applied to the positive and negative coping datasets separately to identify their main themes and terminologies. Table 12 lists the topics found by LDA. In a psychological context, applying LDA to the positive coping dataset identified latent themes that emerged from individuals’ experiences with effective coping strategies. Within the negative coping dataset, the LDA analysis helped uncover topics related to barriers to coping and revealed challenges in managing crises.

Table 12:
LDA topic modeling results.
Topic theme Topic keywords
Positive Coping Dataset
The people seek God’s protection in taking the first dose of the vaccine, and they express their gratitude to God for providing this means of protection to save their lives from the disease. (god, vaccine, hand, folded, protect, disease, oh, live, first, thank)
The people express their feelings of joy as they take the second dose of the vaccine. (heart, smiling, red, vaccine, mark, check, love, done, ban, second)
The people assert and express themselves during quarantine, sharing their activities and feelings about their situations. (quarantine, world, want, since, back, hospital, week, make, need, love)
The people express their positive reactions to being able to socialize freely without the need for masks or social distancing. (joy, tear, wear, way, let, social, help, make, distancing, wearing)
As the people take the vaccine and the pandemic comes to an end, they all return to their normal pre-pandemic lives, this time with more precaution for viruses and illnesses. (time, vaccination, well, getting, life, return, normal, good, precaution, end)
As a new year arrives, the people still worry about their health, yet they celebrate optimistically with their families. (work, health, family, new, medical, year, please, laughing, right, need)
The people express their gratitude for the government’s efforts as the number of COVID-19 cases are reduced, children are allowed to attend school physically once again, the curfew has been banished, and travelling is possible once again. (case, stay, number, school, safe, thank, curfew, government, let, travel)
Negative Coping Dataset
Negative feelings expressed towards lockdown and curfew procedures and the right for citizens to return home from abroad. (god, protect, time, need, power, right, return, end, lockdown, curfew)
Negative feelings expressed as a result of procedures imposed to control the pandemic. (go, back, ban, feel, vaccine, mask, world, home, cry, spread)
The people pray to God in their desperate situations, feeling lonely and miserable in their lives. (isolation, disease, god, swear, must, family, solved, leave, life, suffocating)
Many families have gone through hardships within the pandemic, as many breadwinners lost their jobs or encountered a decrease in their salary during the pandemic, thus leading many residents to a difficult financial situation. (family, government, vaccine, price, solution, work, swab, salary, expense, expatriate)
The people express their hatred towards the situation they are put in, especially being forced to wear masks and not being able to socialize in school; thus, they are heartbroken and tend to have many emotional breakdowns, for their lives feel so surreal. (heart, mask, quarantine, hate, broken, school, movie, free, cry, loudly)
DOI: 10.7717/peerj-cs.2828/table-12

Discussion

Feasibility study of the coping detection

The classification models performed reasonably well with a simple feature set that consisted of text related features only. This provides promising evidence of the underlying structures for the coping strategies. Additionally, adding a user related feature (for example the number of following) along with the SA score had an impact on the classification performance. However, the inclusion of the SA features, as in Dataset V, introduced confusion and reduced the model’s ability to detect the positive coping category. The non-parametric (K-nearest neighbor) and the probabilistic (naïve Bayes) models performed better than other learning methods. The K-nearest neighbor algorithm was more effective in capturing instances of individuals who had negative coping experiences. naïve Bayes was better able to identify those instances of individuals who used positive coping strategies. This provides some insight into the diverse nature of negative coping strategies, suggesting that non-parametric based models could be more suitable.

Thus, a number of insights can be inferred given the performance of the classification model. First, the small sample size of the data and the unbalanced distribution of the positive and negative instances of coping strategies have an impact on the way negative coping strategies are classified based on their underlying structure. This makes a non-parametric distance-based algorithm perform better. Secondly, the probabilistic approach performed better in general and in predicting the positive coping strategies. This is expected due to the variety and subjectivity of the expressions which makes it harder to find a predictive function that represents all the possible expressions. To enhance the results, strategies such as optimizing hyperparameters, addressing class imbalances, and refining feature selections or feature engineering could be considered.

Coping strategies analysis

Understanding the major types of positive and negative coping mechanisms and identifying the strategies within each type provided an insightful understanding of people’s behaviors during stressful conditions. It is important to highlight that 70% of the sample identified positive coping strategies, with “mindfulness practices” and “positive reframing” being the most noticeable among the sample in Kuwait during COVID-19 quarantine. This illustrates the importance of these strategies in navigating the challenges posed by the pandemic in this cultural context. The “mindfulness practices”, such as praying, meditation, and spiritual practices have been extensively studied for their effectiveness in reducing stress, enhancing emotional regulation, and promoting overall well-being (Al-Ghazali, 1997; Tugade & Fredrickson, 2004). The “positive reframing” strategy, which involves consciously focusing on the positive elements of a situation or finding meaning in stressful situations, has been scientifically proven to be linked to greater resilience and psychological well-being (Tugade & Fredrickson, 2004).

In the overall context of the COVID-19 quarantine, where individuals faced intensified levels of stress, uncertainty, fear, and social isolation, the implementation of “mindfulness practices” and “positive reframing” served as effective coping strategies. These strategies helped individuals manage the psychological impact of quarantine, maintain a sense of control despite the uncertainty, and promoted adaptive responses to adversity (Tugade & Fredrickson, 2004; Khoury et al., 2015).

Similarly, the results showed three main negative coping styles: “venting of emotions”, “avoidance”, and “denial”. It is important to highlight that small business owners and immigrants residing in labor areas with strict lockdown restrictions faced job insecurity and extreme isolation. Consequently, the lack of resources such as social support, job security, and financial aid increased the effects of negative coping styles during times of crisis such as the pandemic. LDA topics and negative bigrams reflected the expressions of such coping strategies. For example, negative feelings, mainly sadness, dispirit, fear, and anger, were present in all the discovered LDA topics including negativity toward the lockdown (first topic), the vaccine (second topic), city isolation (third topic), and job insecurity (fourth topic). Individuals employing these coping strategies may have experienced heightened levels of psychological distress and decreased overall well-being during periods of isolation and uncertainty.

Likewise, “avoidance” strategies were expressed as talks of hopelessness in many tweets. The third and fifth topics in Table 12 and bigrams such as “month isolation”, “swear suffocating” and “mentally ill” reflected the despair felt by users in the isolated cities at not being able to return to normal life. The “denial” coping strategy during the pandemic appeared mainly as conspiracy theories surrounding the source of the virus and doubts about the legitimacy of the virus. The bigram “biological war” provides evidence of such talk, inhibiting individuals from acknowledging and addressing their emotional responses to the pandemic and delaying effective adaptation and use of effective coping strategies.

Comparisons between coping analysis and SA

The results in Tables 10 and 11 show that SA methods, although effective in detecting the polarity and subjectivity of the text, do not capture the outcomes of the MH coping strategies. The strategies: “wishful thinking”, “denial”, and “avoidance” are negative coping strategies. However, these were the most confusing strategies for the SA methods. Expressions of these coping strategies have been categorized as having positive or neutral tones of emotion by the SA. The highest miscategorized percentage was found for the “wishful thinking” given that it usually contained good but unrealistic wishes. The second most strategy miscategorized is the “denial” strategy. In this category, the conspiracy-related discussions contributed significantly to the confusion and misclassification.

Conversely, another miscategorization was detected when exploring the outcome of SA on positive coping expressions. For example, “positive reframing” and “problem-solving” include mentions of the negative aspects of a situation but are presented within a positive context, making the SA analysis vulnerable to categorize the sentiment as negative. In addition, expressions that represented “engaging in relaxation techniques” were found to have neutral or sometimes negative expressions. The highest percent of miscategorized statements was linked to the seeking social help strategy since its expressions usually included an explanation of the problem.

In general, negative coping strategies were miscategorized more often with SA than the positive strategies were, (66.57% vs 51%). One of the main reasons for this is related to the contradiction between the positive expression of the statements and their negative hidden contextual meaning. The VADER sentiment analysis resulted in fewer miscategorizations than the TextBlob method in both the negative and positive categories. This outcome was expected given TextBlob’s simplicity and limitation in analyzing social media posts.

Cultural influence and its effect on resources and stressors

The high rate of positive coping strategies used in Kuwait during COVID-19 quarantine highlighted the influence of cross-cultural influences and resource availability effect on coping strategy (AlSumait, AlHeneidi & Smith, 2021). Kuwait’s socio-economic context, characterized by stable health services, job security, financial stability, and food security for a majority of the population, contrasts with situations in other regions where these resources were limited. As a result, individuals in Kuwait may have exhibited higher levels of adaptive coping strategies, such as seeking in-house social support, “engaging in relaxation techniques”, and maintaining routines, due to the relative abundance of resources available to them. Additionally, the cultural norms and values in Kuwait, such as strong social networks and spiritual practices, contributed to the use of effective coping strategies.

Understanding these cultural and contextual factors is fundamental for developing targeted interventions to support mental health and well-being during COVID-19 quarantine periods in Kuwait and other similar contexts. The LDA and bigram analyses provide evidence for this justification. For example, the first, second, and sixth topics within the LDA topics in Table 12 revolve around prayers, availability of resources such as vaccines, family, and medical care. Further evidence of prayer practices is provided by positive bigrams such as “oh god”, “oh bless”, and “thank god.” The rest of the topics and positive bigrams reflect normal positive responses to stressful situations. This relates to our analysis that the users had good resources to overcome the stressors during COVID-19.

However, the appearance of factors such as spiritual practices and beliefs, family, and individual quarantine experiences in both positive and negative coping data revealed that the relationship between an individual’s psychological state and those factors is complex and highly personalized, given the individual’s experience. While these factors can be sources of strength and resilience for some individuals, they may pose challenges and intensify stress for others. Coping strategies associated with adaptive coping and psychological resilience were identified, as well as factors that contribute to psychological distress and vulnerability.

Conclusion

In conclusion, positive coping strategies play a crucial role in promoting well-being during times of stress. Negative coping strategies may exacerbate distress and contribute to negative outcomes. Identifying the individuals’ stressors, their available resources, and understanding the different effects that the coping strategies have on well-being can inform interventions and MH support. The aim of such time-effective MH support is to enhance individuals’ resilience and coping skills in order to foster better mental health and overall quality of life. This research presented a novel psychological machine learning framework based on the DRIVE well-being model to achieve this objective.

The framework was used to detect coping strategies used during the stressful COVID-19 pandemic by analyzing social media posts. Statistical analysis, sentiment analysis, topic modeling, and classification learning were conducted to explore coping strategies. The analysis showed a set of expressions and topics that identified the underlying themes that characterized and categorized the coping processes during crisis events. The article also shed some insights into the limitations of SA in identifying the coping strategies from the users’ social media posts. In general, negative coping strategies were miscategorized more often than the positive strategies. Machine learning classification models to detect positive and negative MH coping strategies performed reasonably good, illustrating the existence of an underlying semantic structure of the coping strategies. A coping detection tool could be developed in the future by using enhanced machine-learning models that can be applied to larger datasets.

The novelty of this work can be summarized as follows: introducing a complete framework for the DRIVE-Coping detection model, creating a labeled coping dataset collected from X platform, analyzing the expressed coping strategies and identifying the themes and terminologies of each coping strategy, comparing the coping model with the widely used sentiment analysis to prove their shortness in identifying coping strategies, and testing the framework by creating a coping classification model.

Understanding coping as a whole is a complex process that takes time to comprehend, and analyzing social media posts may not fully reflect the circumstances of users under stress. The coping process plays a fundamental role in understanding psychological well-being. In addition, analyzing the X users’ coping mechanisms is an informative approach to understanding people’s behaviors under uncertain and stressful situations. A longitudinal analysis of the same sample and analyzing their expressions through social networks may reflect a sense of positive or negative coping. Future work will focus more on exploring different ways to extract text relevant features. Results from the LDA and feature engineering approaches could be used to construct custom word sets to help achieve better classification accuracy.

The limitations of the research approach evolve in many ways. This naturally introduces cultural and geographic biases. First, coping detection is more complicated than learning only from text. It requires an overall examination of many other factors that contribute to the well-being of individuals, many of which are offline or cannot be reflected in social media posts. Second, coping is a long process that cannot be captured in one window of time. However, this framework can provide indicators of the current expression. A major development of the framework is related to creating a dynamic model that is improved over time. Third, although the framework is general and can be implemented in any context and size, the dataset used to justify the framework is limited in time, size, and geographic location. It is also restricted to social media users, which makes the data not representative to the whole society, and excludes population groups without access to social media.

This work can be extended in many directions. The study can be improved by implementing deep learning classification methods and developing a large language model for the DRIVE-coping strategies. It is also interesting to perform a similar analysis across cultural differences. Coping mechanisms represent just one component of the DRIVE model. Future plans can be set to implement other factors of the model, most importantly detecting the demands and the resources from social media posts, as a step to create a holistic observatory to monitor and evaluate the mental health as a whole.

Taken together, the results of this study further illuminate the positive contributions of the DRIVE model to encouraging psychological well-being, resilience and mental health strategies in response to COVID-19, and they strongly support the introduction of the DRIVE model in public health interventions aimed to foster mental health support and coping-building strategies. Generally, our findings have important implications for public health practitioners, policymakers, and mental health professionals. In contribution to the implementation of the DRIVE model in intervention services aiming to enhance the resilience and mental health of communities, it is crucial for future researchers and practitioners to support the translated DRIVE tool, which can further assist in diffusing this innovative model.

Supplemental Information

Python notebooks for data preprocessing, feature engineering, LDA topic modeling, and prediction.

DOI: 10.7717/peerj-cs.2828/supp-1

Coping Dataset:1000 tweets with coping labels.

DOI: 10.7717/peerj-cs.2828/supp-2
  Visitors   Views   Downloads