New chaosintegrated improved grey wolf optimization based models for automatic detection of depression in online social media and networks
 Published
 Accepted
 Received
 Academic Editor
 Muhammad Asif
 Subject Areas
 Algorithms and Analysis of Algorithms, Data Mining and Machine Learning, Network Science and Online Social Networks, Social Computing, Text Mining
 Keywords
 Chaos, Depression detection, Grey wolf optimization, Metaheuristic algorithms
 Copyright
 © 2023 Akyol
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2023. New chaosintegrated improved grey wolf optimization based models for automatic detection of depression in online social media and networks. PeerJ Computer Science 9:e1661 https://doi.org/10.7717/peerjcs.1661
Abstract
Depression is a psychological effect of the modern lifestyle on people’s thoughts. It is a serious individual and social health problem due to the risk of suicide and loss of workforce, high chronicity, recurrence rates, and prevalence. Therefore, identification, prevention, treatment of depression, and determination of relapse risk factors are of great importance. Depression has traditionally been diagnosed using standardized scales that require clinical diagnoses or patients’ subjective responses. However, these classical techniques have some limitations such as cost, uncomfortability, subjectivity, and ineffectiveness. Social media data can be simply and efficiently used for depression detection because it allows instantaneous emotional expression and quick access to various information. Some machine learningbased methods are used for detecting the depression in online social media and networks. Nevertheless, these algorithms suffer from several drawbacks, including data sparsity, dimension explosion, restricted capacity for generalization, and low performance on imbalanced data sets. Furthermore, many machine learning methods work as blackbox models, and the constructed depression detection models are not interpretable and explainable. Intelligent metaheuristic optimization algorithms are widely used for different types of complex realworld problems due to their simplicity and high performance. It is aimed to remove the limitations of studies on this problem by increasing the success rate and automatically selecting the relevant features and integrating the explainability. In this study, new chaosintegrated multiobjective optimization algorithms are proposed to increase efficiency. New improved Grey Wolf Optimization algorithms have been proposed by integrating Circle, Logistic, and Iterative chaotic maps into the improved Grey Wolf Optimization algorithm. It is aimed to increase the success rate by proposing a multiobjective fitness function that can optimize the accuracy and the number of features simultaneously. The proposed algorithms are compared with different types of popular supervised machine learning algorithms and current metaheuristic algorithms that are widely and successfully used in depression detection problems. Experimental results show that the proposed models outperform machine learning methods, as evidenced by examining results with accuracy, Fmeasure, MCC, sensitivity, and precision measures. An accuracy value of 100% was obtained from proposed algorithms. In addition, when the confusion matrices are examined, it is seen that they exhibit a successful distribution. Although it is a new research and application area for optimization theory, promising results have been obtained from the proposed models.
Introduction
Depression is defined as a mental disorder that appears in the form of decreased sensitivity to stimuli, loss of initiative and selfconfidence, and strengthening of hopelessness and pessimism. Depression and emotional distress are psychological effects of the modern lifestyle on people’s thoughts. Depression can begin at any age, however, the World Health Organization reports that most suicides caused by depression occur in people between the ages of 15 and 29. Sleep disturbances, a lack of energy, a lack of feelings of worthlessness, interest in daily activities, inability to concentrate, and recurrent suicidality are all symptoms of depression (Liu et al., 2022). High levels of depression, a mental disease characterized by a low mood and a reluctance to participate in activities, have a significant impact on people’s thoughts and behavior (Zhao et al., 2023). Depression detection is critical for assisting in the relief of these threats. There are numerous reasons for depression, some of which humans do not fully comprehend. Figure 1 depicts seven of the widely accepted types of depression (Schimelpfening, 2023). Their characterizations with key features are also listed in this figure.
Identification, prevention, treatment of depression, and determination of relapse risk factors are of great importance as it is a serious individual and social health problem due to the risk of suicide and loss of workforce, high chronicity, and recurrence rates and prevalence (Amanat et al., 2022). Depression can have a negative impact on individual’s socioeconomic status. People who are depressed are less likely to socialize. Counseling and psychological therapies can aid in treating depression. Due to people’s disabilities in various situations, depression is a big concern on a global scale. Depression, which is a single mental health illness with a variety of diagnostic techniques, is known to cause major social issues in people, including suicide (Gupta & Sharma, 2021). Consequently, addressing the burden of mental health concerns calls for a comprehensive approach.
Depression has traditionally been diagnosed using standardized scales that require clinical diagnoses or patients’ subjective responses provided by attending clinicians, and these classical techniques have some limitations. First, people’s responses to traditional standardized scales are probably influenced by the patient’s mental state at the time, context, patient’s current mood, the clinicianpatient relationship, and the patient’s memory bias and past experiences. Classical diagnostic methods are also limited in terms of temporal granularity. Interviews and subjective diagnostic criteria often lead to ambiguous and contradictory diagnoses. The second limitation is that individuals may be ashamed or unaware of their depressive symptoms and are less likely to seek professional help, particularly in the early stages. According to the study (Shen et al., 2017), if the patient was in the early stages of depression, more than 70% of the population would not seek professional help, implying that they would likely wait for their symptoms to worsen before needing support. Finally, traditional methods of detecting depression, which rely on facetoface interviews, are expensive concerning both time and money, making them pricey for some people (Liu et al., 2022). The conventional clinical diagnosis for depression also does not provide an effective or systematic method for integrating behavioral findings that indicate the severity and presence of depression. As a result, more costeffective methods for detecting depression in large populations are required.
The resources available to recognize and manage states of depression or suicidality are typically insufficient. Online data and records are increasingly being recognized as viable sources of information for public health decisions. By the way, social media has developed into a medium for people to express their feelings, opinions, ideas, and emotions, which makes them feel satisfied to share ideas that are upsetting to them. People’s use of social media has evolved and created trends, and due to the social network’s growing popularity, more people are using it to publicly express themselves online. Adolescent use of digital and social media. Previous research has found that has changed the way they form and maintain friendships. While social media allows early teens to form new relationships online, providing them with the opportunity to connect with their peers, it also creates new obligations, expectations, and social dynamics that result from continued commitment to friendships. Persistent connection and the social norms that govern digital media can cause stress and strain on friendships (De Groote & Van Ouytsel, 2022). In addition, studies show that the higher the fear of depression, the stronger the habit of using and constantly checking social media. It shows that this can eventually trigger emotional and psychological states such as depression (Park, 2022).
Although questionnaires and interviews have historically been used in psychological research, many researchers are increasingly collecting data online. They seek to use the methods they provide to psychologically analyze social network data and draw conclusions. Using web text shared by individuals on social media, one can extract the emotions and expressions hidden within the text. Researchers have used data obtained from social media (posts, tweets, size of social network, likes, comments, join groups demographics, status updates, etc.) as indicators for online diagnosing the situation of users, such as whether or not the user suffers from depression (Latif et al., 2021). Because social media allows instantaneous emotional expression and easy access to a wealth of information, recognizing depression is now an important step in reducing the death rate (Aleem et al., 2022).
With rapid advances in machine learning approaches, it is critical to incorporate modern technology into depression diagnosis to prevent depressive individuals from committing suicide through earlier and more accurate depression detection. The classical clinical diagnostic method for depression does not appropriately identify the complexity of depression. Using machine learning methods, the symptom composition associated with mental disorders such as depression can be easily predicted and detected. As a result, the machine learningbased diagnostic approach appears to be a viable option for predictive analysis. Text, sensors, structured data, and multimodal technology interactions are the major domains used in the healthcare sector for extracting observations associated with mental disorders using machine learning (Aleem et al., 2022). Text messages, social media platforms, and clinical records can all be used to extract text sources. Audio signals and mobile phones can be used to analyze sensor data. Data extracted from traditional screening questionnaires, medical health records, and scales comprise the structured data. Data from human interactions with everyday technological equipment, virtual agents, and robots are included in the multimodal technology interactions.
Previous research has found that automatic textbased analysis of depressive symptoms can be used to detect depressing phrases or disrespectful sentences in blog posts or conversations as well as sentiment recovery from suicide notes (Liu et al., 2022). The classification of shared posts in blogs is critical to investigate to save desperate people’s lives. Fortunately, applying machine learning specifically to text data from social media can provide a viable solution to the problem of depression detection. Social media platforms such as Facebook, Twitter, microblogs, and discussion forums have long been popular for expressing and documenting people’s thoughts, personalities, feelings, behaviors, and moods (Liu et al., 2022). However, research into extracting depression symptoms from text remains promising. Machine learning has revolutionized latent knowledge extraction in the field of text processing tending to understand the context of complex natural language sentences (Gupta & Sharma, 2021). Existing machine learningbased depression detection systems go through a series of steps, including preprocessing, feature extraction, and depression detection. Existing depression diagnostic research (Goodarzi et al., 2017) has discovered several flaws in accurately detecting depression in a timely manner. Traditional research has acquired accurate information about depressionindicative symptoms to effectively identify and diagnose depression from usergenerated natural language text. Even though it is a crucial task, extracting the depressionindicative symptoms from violationrich social media material results in erroneous depression detection. Therefore, to enhance the effectiveness of depression identification in social media, numerous artificial intelligence depression detection models have employed feature extraction and data preprocessing techniques to the raw input textual messages to produce a potential feature set for the learning model (William & Suhartono, 2021; Kim, Lee & Park, 2021; Wongkoblap, Vadillo & Curcin, 2021; Figueredo & Calumby, 2020; Lin et al., 2020; Tao et al., 2016). Extensive monitoring of online social networks and the use of machine learning methods on data obtained from social media can automatically diagnose depressive symptoms, helping identify individuals at risk of depression and supporting current screening methods.
Although some machine learningbased methods applied to social media data have the potential as a way to identify people at depression risk through largescale monitoring of social media and could complement classical screening methods, some methods achieved poor results in terms of different metrics. Furthermore, many machine learningbased methods suffer from some drawbacks, including data sparsity, dimension explosion, restricted capacity for generalization, and low performance on imbalanced data sets. Additionaly, many machine learning methods work as blackbox models, and the constructed depression detection models are not interpretable, explainable, and comprehensible. To increase the success rate and automatically integrate explainability for depression detection in online social media and networks, new chaosintegrated optimization algorithms are proposed in this article. Chaosintegrated improved Grey Wolf Optimizer (GWO) can also be used as a general solution search and optimization algorithm for different types of complex problems.
A study in which chaotic maps were adapted to metaheuristic algorithms was conducted by Bacanin et al. (2021). This study examines the performance of an improved chaotic firefly algorithm and tests its ability to automatically select the optimal dropout rate in deep learning applications. Theoretical and practical experiments show that the proposed algorithm performs better than other methods. This work represents an important development in the field of deep learning that can contribute to the prevention of overfitting (Bacanin et al., 2021). Malakar et al. (2020) introduce a Genetic Algorithmbased hierarchical feature selection model for handwritten word recognition. They used this model to optimize local and global features from handwritten word images. They tested it on a dataset containing 12,000 samples of handwritten words written in Bangla script. With the proposed model, better results were obtained compared to existing methods by reducing the feature size by 28% while increasing the word recognition performance by 1.28% (Malakar et al., 2020). In these two studies, it is seen that the new research field successfully combines machine learning and swarm intelligence approaches and can achieve extraordinary results in different fields.
To emphasize the effects of depression on people’s mental health in modern life and the seriousness of this problem. The high risk of suicide and job loss, chronicity, relapse rates, and high prevalence show that depression is a major problem in terms of individual and social health. Therefore, it is important to diagnose, prevent, and treat depression and identify relapse risk factors. The limitations and deficiencies of traditional diagnostic methods require the investigation of new approaches to solve this problem. Social media data is seen as a useful resource as part of these new approaches, but existing methods in this area have limitations and challenges. In this context, overcoming the limitations in detecting and managing depression by using intelligent metaheuristic optimization algorithms is the motivation of this study.
The main contributions of this study are listed below:

Social media and network data are considered as problem search space, and the new improved optimization methods proposed in this study are adapted as solution search strategies and models for automatic depression detection.

New explainable, comprehensible, and interpretable depression detection models different from the classical black box algorithms are proposed.

The new improved chaosintegrated general purpose GWO algorithms are proposed as a global optimization and solution search methodology.

To improve efficiency, chaos theory is embedded in the optimization method and a flexible fitness function is created for the intelligent optimization algorithm. The fitness function is easily adaptable to different goals and can be efficiently added to the fitness computation.

The proposed new solution search methodologies, which include appropriate encoding type, flexible multiobjective fitness function, and chaos theory, can be easily applied to a variety of social network and media problems.

A new area of research and application is addressed for optimization theory.

Despite being an innovative approach, the proposed models yield encouraging outcomes.
The article is organized as follows: Related works are reviewed in the Related Works section. The theoretical background on GWO and chaotic maps is presented in the Theoretical Works section. The proposed general purposed chaosintegrated new improved GWO (NIGWO) algorithms and adaptation of these algorithms for the problem of depression detection are described in the Proposed Works section. The Experimental Results section presents the experimental results. Finally, the article is concluded and possible future works are mentioned in the Conclusions.
Related Works
There is currently no effective clinical description of depression. It limits and biases the diagnostic procedure. The difficulty of diagnosing depression depends not only on the patient’s educational level, cognitive aptitude, and honesty in describing their symptoms but also on the physicians’ experience and motivation. To effectively diagnose the degree of depression, much clinical expertise and knowledge are required (Aleem et al., 2022). In order to automatically evaluate the severity scale of depression, various automatic depression estimating methods have been established in recent years. A brief categorization of related works on depression detection is depicted in Fig. 2. Recent discussion forums have concentrated in particular on identifying depressed individuals through the analysis of their social media posts. In this section, related works about depression detection methods, models, and systems in online social media and networks are presented.
The effects of employing a supervised machine learning method to assess predictors for detecting posttraumatic severe depression were covered by Kim, Lee & Park (2021). As part of their research, they used Twitter users to create a model for each linguistic style. A deep neural network approach was put forward by Wongkoblap, Vadillo & Curcin (2021) to examine depression in social media. For their analysis, they used the Twitter dataset.
With the aid of preprocessing techniques, the depression detection model (Figueredo & Calumby, 2020) enhances the accuracy of the prediction. The textual data preparation for depression identification aids in mapping emotions to actual emotion words, which maximizes the early and highly relevant detection of depression. Through a deep visualtextual multimodal learning method, the Sense Mood model (Lin et al., 2020) efficiently analyses and detects users’ depression. Based on an analysis of the urgent demand for methods, the sentiment analysisbased depression detection approach (Tao et al., 2016) detects potential depression early. It determines the depressive people using the analysis of the psychological status of the users in the social network by analyzing the usergenerated content in the social network through sentiment analysis without compromising the privacy of the users in the social network. To prevent suicide, the study (Leiva & Freire, 2017) sequentially analyzes user posts in a social network to detect the risk of depression early on. Furthermore, it penalizes the delay in detecting depressive people by using a combination of learning algorithms to recognize early depression symptoms.
By analyzing clinical depression and online behaviors from various social media perspectives, the multimodal dictionary learning approach (Shen et al., 2017) extracts six depressionrelated attribute groups. It detects depressed users in the Twitter network in realtime using welllabeled depression and nondepression data, assisting in proactive care for depressed social users. The study Trotzek, Koitka & Friedrich (2018) uses a convolutional neural network and various word embeddings to detect depression from userlevel linguistic metadata analysis. Using the learning model, the indications of depression in written texts generated by social users are classified.
The researchers proposed a lexiconbased approach for detecting depression (Li et al., 2020). Word2Vec, the label propagation algorithm, and a semantic relationship graph were used to create a lexicon. The authors based their findings on 111,052 Weibo microblogs consisting of 1,868 depressed or nondepressed users. To predict depression, they used six features and five classification methods. The results show that the generated lexicon performed better for classification.
Most previous studies had different limitations in terms of test dataset size, text analysis languages, single platform, and levels of depression, which aid psychiatrists in following treatment steps according to the detected level. Social media text analysis can develop methods for early detection of associated ailments and for determining the mental health status of users (Trifan et al., 2020).
Many methods for depression detection in online social network messages either use person descriptivebased highlighting approaches or textualbased highlighting approaches. The linguistic features of social network text, such as words, ngrams, partofspeech, and other linguistic characteristics, are the focus of textualbased features (Chiong, Budhi & Dhakal, 2021a; Chiong et al., 2021b). In contrast, the descriptivebased featuring methodology focuses on subject descriptions such as employment status, age, gender, smoking, alcohol or drug consumption, income, and other details about the patient or subject (De Souza Filho et al., 2021). However, it can be concluded that people’s actions on social media have multiple dimensions. Understanding the interaction through various modalities is difficult, as is characterizing people from selective viewpoints. Table 1 summarizes the literature works about depression detection in online social media and networks along with the limitations.
Study  Data Source  Methods  Results  Limitations 

AlSagri & Ykhlef (2020)  Naïve Bayes, Support Vector Machines, Decision Tree  Accuracy: 0.825 Precision: 0.739 Sensitivity: 0.850 Fmeasure: 0.791 AUC: 0.780 
Utilizing the different kernels in SVM Not avoiding overfitting dataset Lack of interpretability and comprehensibility 

Kim et al. (2020)  XGBoost, Convolutional Neural Network  Accuracy: 0.751 Precision: 0.891 Sensitivity: 0.718 Fmeasure: 0.795 
Low success rate Lack of explainability Limited scope 

Fatima et al. (2018)  LiveJournal  Random Forest  Accuracy: 0.918  Small size of data Single machinelearning algorithm Absence of other evaluation metrics Inefficiency in unbalanced dataset 
Orabi et al. (2018)  Convolutional Neural Network, Bidirectional LSTM  Accuracy: 0.850  Small size of data Lack of interpretability and comprehensibility Hardness task of exercising for RNN Report and Slope disappearing problems 

De Choudhury et al. (2013)  Support Vector Machines  Accuracy: 0.700  Low success rate Lack of explainability Single machinelearning algorithm Inappropriateness for large data sets 

Thorstad & Wolff (2019)  Cluster Analysis, Logistic Regression  Accuracy: = 0.390 Fmeasure = 0.380 
Low performance Lack of interpretability 

Nadeem (2016)  Support Vector Machines, Decision Tree, Naïve Bayes, Logistic Regression 
Accuracy: 0.860 Sensitivity: 0.830 Fmeasure: 0.840 
Usage of the old dataset Emphasis on user confession Lack of interpretability 

Aldarwish & Ahmad (2017)  Facebook, Twitter, and LiveJournal  Support Vector Machines, Naïve Bayes,  Accuracy: 0.633 Sensitivity: 0.570 
Usage of old Arabic dataset Limited phrases and sentences Low success rate 
Gaikar et al. (2019)  Support Vector Machines–Naïve Bayes hybrid model  Accuracy: 0.850  High computational complexity for comparing the longshort snippets. Need for determining the appropriate values for combined methods Lack of interpretability 

Islam et al. (2018)  Support Vector Machines and LIWC  Accuracy: 0.700  Inappropriateness for large data sets Lack of interpretability 

Wang et al. (2018)  Convolutional Neural Network  Fmeasure: 0.670  Not encrypting the situation and alignment of an entity Lack of interpretability 

Burdisso, Errecalde & MontesyGómez (2019)  SS3  Fmeasure: 0.610 Precision: 0.630 Sensitivity: 0.600 
Small size of data Low performance Lack of interpretability 

Adarsh et al. (2023)  Oneshot Decision, Combining of SVM and KNN  Accuracy: 0.981 Precision: 0.968 Sensitivity: 0.976 Fmeasure: 0.973 AUC: 0.979 
Lack of handling multiclass depression classification Requiring too many parameters that should be adjusted a priori 

Gupta, Pokhriyal & Gola (2022)  Combining Convolutional Neural Network and LSTM  Accuracy: 0.940 Precision: 0.942 Sensitivity: 0.937 Fmeasure: 0.940 
Necessity of many values that should be adjusted a priori for the combined methods Lack of explainability Inappropriateness for unbalanced data sets 

Chen et al. (2023)  Combining Convolutional Neural Network and SBERT  Accuracy: 0.860 Precision: 0.850 Sensitivity: 0.870 Fmeasure: 0.860 
High computational cost Lack of explainability Requiring too many parameters 
Theoretical Background
Intelligent metaheuristic optimization algorithms are widely used for different types of complex realworld problems due to their simplicity and high performance (Akyol, 2022). Many problems can be modeled as an optimization problem, and metaheuristic methods can be easily adapted as solution search strategies for the focused problem. These algorithms are categorized into nine different classes based on the inspiration sources (Akyol & Alatas, 2017). Specifically, biologybased and swarmbased metaheuristic methods are preferred for the search and optimization problem. In this article, a swarmbased GWO algorithm is adapted as a base for an automatic and direct solution search strategy for the focused depression detection problem.
Grey wolf optimizer
The grey wolf optimizer (GWO) is a swarmbased metaheuristic method inspired by the hunting skills and leadership hierarchies of gray wolves within the swarm. There are four types of wolves in the GWO method, which is inspired by the hunting behavior of gray wolves and their social leadership in nature. The α, β, and δ wolves have the best position, respectively, and the ω wolves, who are out of these wolves and will be led by these wolves. Hunting of wolves consists of three stages: encircling, hunting, and attacking the prey (Mirjalili, Mirjalili & Lewis, 2014).
Encircling: It is the stage of encircling the prey by gray boxes and is expressed by Eqs. (1) and (2).
(1)$D=C\times {X}_{p}\left(t\right)X\left(t\right)$ (2)$X\left(t+1\right)={X}_{p}\left(t\right)A\times D$
X_{p} represents the position of the prey, D represents the distance between the current candidate wolves and the top three wolves, t represents the current iteration, X is used for the position vector of a gray wolf, and the coefficients A and C are the coefficients computed by the equations in Eqs. (3) and (4) (Mirjalili, Mirjalili & Lewis, 2014).
(3)$A=2\times a\left(t\right)\times {r}_{1}a\left(t\right)$ (4)$C=2\times {r}_{2}$
The element values of vector a are decremented from 2 to 0 each time the algorithm is run. r_{1} and r_{2} are random vectors that take values in the range [0, 1].
Hunting: At this stage, which mathematically models the hunting behavior of wolves, the ω wolves follow the α, β, and δ wolves, which are assumed to know the location of the prey best. The equations for the hunting phase are given in Eqs. (5)–(7) (NadimiShahraki, Taghian & Mirjalili, 2021). (5)$\begin{array}{c}\hfill {D}_{\alpha}={C}_{1}{X}_{\alpha}X\left(t\right)\hfill \\ \hfill {D}_{\beta}={C}_{2}{X}_{\beta}X\left(t\right)\hfill \\ \hfill {D}_{\delta}={C}_{3}{X}_{\delta}X\left(t\right)\hfill \end{array}$
C_{1}, C_{2}, and C_{3} are calculated using Eq. (4). (6)$\begin{array}{c}\hfill {X}_{i1}={X}_{\alpha}\left(t\right){A}_{i1}\times {D}_{\alpha}\left(t\right)\hfill \\ \hfill {X}_{i2}={X}_{\beta}\left(t\right){A}_{i2}\times {D}_{\beta}\left(t\right)\hfill \\ \hfill {X}_{i3}={X}_{\delta}\left(t\right){A}_{i3}\times {D}_{\delta}\left(t\right)\hfill \end{array}$
X_{α}, X_{β}, and X_{δ} are the three best candidate solutions in the t iteration. A_{1}, A_{2}, and A_{3} are computed using Eq. (3). (7)$X\left(t+1\right)=\frac{{X}_{i1}\left(t\right)+{X}_{i2}\left(t\right)+{X}_{i3}\left(t\right)}{3}$
Attacking: The process of hunting wolves ends when the prey terminates moving. After this point, the attacking process begins. This process is mathematically expressed as a decrease in the value of a from 2 to 0 over the iterations. According to Emary, Zawbaa & Grosan (2017), half of the iterations are for exploration and the other half for exploitation. In the attacking step, the wolves move their position to an arbitrary position between the prey position and their current position.
In the algorithm, first, the wolf population placed at random locations in the search area is created and the fitness values are calculated according to the locations of these wolves. The algorithm is terminated when it reaches the MaxIter number. In each iteration, the encircling, hunting, and attack steps are repeated. a, which represents the best position of the prey, is the solution. The improved GWO (IGWO) algorithm has been proposed because of the risks of early convergence, the imbalance between exploration and exploitation, and the population diversity of the standard GWO algorithm (NadimiShahraki, Taghian & Mirjalili, 2021).
Improved grey wolf optimizer
In the standard GWO; the α, β, and δ wolves with the best positions direct the remaining ω wolves to promised areas in the search space, and the decrease in diversity of population may cause the GWO to be stuck at the local optimum. To overcome these problems, a new search strategy is associated with the update and select step in the proposed IGWO. IGWO includes initialization, movement, and selection and update phases.
Initializing stage: In this stage, random N wolves are created in the search space, as shown in Eq. (8) (NadimiShahraki, Taghian & Mirjalili, 2021). (8)${X}_{ij}={l}_{j}+ran{d}_{j}\left[0,1\right]\times \left({u}_{j}{l}_{j}\right),i\in \left[1,N\right],j\in \left[1,M\right]$
M represents the size of the problem, l_{j} represents the lower limit that the j th decision variable can take, and u_{j} represents the upper limit. rand_{j} generates a random number between 0 and 1. The fitness value of ${X}_{i}\left(t\right)$, which represents the position of the ith wolf in the tth iteration, is evaluated according to the fitness function.
Movement stage: In addition to the group hunting strategy, a dimensional learningbased hunting (DLH) strategy was added for individual hunting at this stage. In the DLH stage, each wolf is learned by neighboring wolves as another candidate for the newly available position of ${X}_{i}\left(t\right)$ (NadimiShahraki, Taghian & Mirjalili, 2021).
Canonical GWO search strategy: After determining the α, β, and δ wolves with the best fitness values in the population, the linear decreasing coefficient a and the coefficients A and C are computed by Eq. (3)–(5). As shown in Eqs. (5) and (6); X_{α}, X_{β}, and X_{δ} are used to find the surrounding prey. Finally, Eq. (8) is used to calculate the new position of ${X}_{i}\left(t\right)$.
DLH search strategy: In the GWO, location updates are made based on the three lead wolves in the population. This causes the algorithm to converge slowly and reduces the diversity in the population. To overcome these weaknesses of the algorithm, the proposed DLH search strategy considers other wolves in the population when updating the location. The position of the wolf ${X}_{i}\left(t\right)$ in the DLH search strategy is calculated using Eq. (11). When updating the location, a randomly selected wolf from the population and different neighbors are used. In this strategy, in addition to ${X}_{iGWO}\left(t+1\right)$ for the wolf ${X}_{i}\left(t\right)$, another candidate solution, ${X}_{iDLH}\left(t+1\right)$ is produced. First, the value of ${R}_{i}\left(t\right)$, which is the Euclidean distance between the candidate solution ${X}_{i}\left(t\right)$ and ${X}_{iGWO}\left(t+1\right)$, is calculated using Eq. (9) (NadimiShahraki, Taghian & Mirjalili, 2021). (9)${R}_{i}\left(t\right)=\u2225{X}_{i}\left(t\right){X}_{iGWO}\left(t+1\right)\u2225$
Then, the neighbors of ${X}_{i}\left(t\right)$ denoted by ${N}_{i}\left(t\right)$ are found using Eq. (10) depending on the radius of ${R}_{i}\left(t\right)$. (10)$N}_{i}\left(t\right)=\left\{{X}_{j}\left(t\right){D}_{i}\left({X}_{i}\left(t\right),{X}_{j}\left(t\right)\right)\le {R}_{i}\left(t\right),{X}_{j}\left(t\right)\in Pop\right\$
Here, D_{i} represents the distance between X_{i} and X_{j}. After the neighborhood of ${X}_{i}\left(t\right)$ is created, and multineighbor learning is calculated using Eq. (11). The dth dimension of ${X}_{iDLH}\left(t+1\right)$ is calculated using a randomly selected neighbor ${X}_{n,d}\left(t\right)$ from ${N}_{i}\left(t\right)$ and randomly selected ${X}_{r,d}\left(t\right)$ from the population. (11)${X}_{iDLH,d}\left(t+1\right)={X}_{i,d}\left(t\right)+rand\times \left({X}_{n,d}\left(t\right){X}_{r,d}\left(t\right)\right)$
Selection and update stage: This is the stage where the most suitable individual is determined according to the fitness values of the two candidates, X_{i−GWO} and X_{i−DLH}. (12)${X}_{i}\left(t+1\right)=\left\{\begin{array}{cc}{X}_{iGWO}\left(t+1\right),\phantom{\rule{10.00002pt}{0ex}}\hfill & iff\left({X}_{iGWO}\right)<f\left({X}_{iDLH}\right)\hfill \\ {X}_{iDLH}\left(t+1\right)\phantom{\rule{10.00002pt}{0ex}}\hfill & otherwise\hfill \end{array}\right.$
After all, these operations, if the fitness value of ${X}_{i}\left(t+1\right)$ obtained is better than ${X}_{i}\left(t\right)$, the value of ${X}_{i}\left(t+1\right)$ is updated as the new position using Eq. (12). Otherwise, ${X}_{i}\left(t\right)$ remains unchanged. After all of these processes have been applied for each candidate solution in the population, the number of iterations is increased by one and the process repeats until the termination conditions are met (NadimiShahraki, Taghian & Mirjalili, 2021).
New improved grey wolf optimizer (NIGWO)
In this study, the NIGWO algorithm, which is a new version of the IGWO algorithm proposed by Akyol, Yildirim & Alatas (2023) was used. As seen in Eq. (3) in the IGWO algorithm, the random number r_{1} is used when calculating the A coefficient. To enhance the algorithm performance, a value that linearly decreases from alpha to 0 given in Eq. (13) is used instead of a random number. In this study, 2 was chosen as the constant alpha value and MaxIter represents the maximum iteration number. (13)${r}_{1}=alphat\times \left(alpha/MaxIter\right).$
Chaotic maps
Generating longterm random sequences with good uniformity is essential in sampling, numerical analysis, decisionmaking, and especially metaheuristic optimization. Successful generation of random numbers reduces computation and storage time to obtain the desired accuracy (Cheng, Tran & Cao, 2016). Recently, random number sequences have been replaced by chaotic number sequences and provide better performance in many complex problems. The use of chaotic number sequences has increased due to their spread spectrum properties, theoretical unpredictability, irregularity, stochastic properties, and related ergodic properties (Akyol, Yildirim & Alatas, 2022). In the literature, some studies increase performance by adding chaotic maps to optimization algorithms. The integrated version of the chaotic maps into the Optics Inspired Optimization algorithm was used in the deception detection problem (Bingol & Alatas, 2023). Tent chaotic maps are embedded to increase the initial population diversity of the Bald Eagle Search Optimization method (Shen et al., 2022). The Bernoulli chaotic map is integrated into the Sparrow Search Algorithm to improve the richness and diversity of the population (Wang et al., 2022).
Circle map
Circle map, one of the onedimensional chaotic maps, was defined by Zheng (1994). This map, which belongs to the family of dynamical systems on the circle, is calculated as given in Eq. (14). (14)${z}_{m+1}={z}_{m}+{y}_{2}\frac{{y}_{1}}{2\pi}sin\left(2\pi {z}_{m}\right)$
Here, m is the generated chaotic number and z_{m} is the m th chaotic number in the range [0,1]. The y_{1} and y_{2} parameters also take the values of 0.5 and 0.2, respectively (Lucay & Jamett, 2021).
Logistic map
The logistic map, introduced by May (2004), is one of the simplest onedimensional chaotic maps and is described as shown in Eq. (15) (May, 1976; Lucay & Jamett, 2021). (15)${z}_{m+1}=\mu {z}_{m}\left(1{z}_{m}\right)$
Here, the µvalue is taken as 4 (Lucay & Jamett, 2021).
Iterative map
Like the logistic map, Iterative Map was introduced by May. It is an Iterative Chaotic Map with infinite collapse and is expressed as shown in Eq. (16). (16)${z}_{m+1}=sin\left(\frac{y\pi}{{z}_{m}}\right)$
where y ∈ (0, 1) is a suitable parameter (May, 1976; Gandomi et al., 2013). The initial value of z_{1} is taken as 0.7 for all three chaotic maps.
Proposed Methods
Proposed chaotic NIGWO algorithms
The convergence feature of NIGWO is affected due to its stochastic nature, which uses a random number sequence for parameters during operation. As with other evolutionary computationbased algorithms, NIGWO has no analytical results dependent on a specific number generator, which guarantees performance improvements. In this study, chaotic maps that emerge in nonlinear dynamics of biological populations with chaotic behavior are used for randomly generated numbers, and in this way, it is aimed to increase the effectiveness of NIGWO. Based on this, four different chaotic NIGWO algorithms are proposed using chaotic maps for the depression detection problem.
CNIGWO1
The initiation step of NIGWO can affect the success of convergence. Thus, various chaotic systems are used rather than random number sequences in the start step of NIGWO. Therefore, a chaotically initiated NIGWO termed chaotic NIGWO (CNIGWO1) is proposed in this study. In this way, the global convergence of NIGWO is tried to be improved and kept away from being trapped in local solutions. Equation (17) is used instead of Eq. (8), which is used to create the initial population in the NIGWO algorithm. (17)${X}_{ij}={l}_{j}+{z}_{m+1}\times \left({u}_{j}{l}_{j}\right),i\in \left[1,N\right],j\in \left[1,D\right]$
The variable z_{m+1} is calculated using Eqs. (14), (15) and (16). The pseudocode of the initial step is given in Fig. 3 and the flowchart of the CNIGWO1 algorithm is shown in Fig. 4.
CNIGWO2
Second, in this proposed method, the randomly generated C_{1} value in the range [0, 2] in Eq. (5) in NIGWO is generated with chaotic maps. The resulting new expression is defined in Eq. (18). (18)${D}_{\alpha}=\left(2\times {z}_{m+1}\right){X}_{\alpha}X\left(t\right)$
CNIGWO3
Third, in CNIGWO3, the randomly generated C_{2}value used for the hunting phase of the β wolf in Eq. (5) is generated with chaotic maps. The resulting new expression is defined in Eq. (19). (19)${D}_{\beta}=\left(2\times {z}_{m+1}\right){X}_{\beta}X\left(t\right)$
CNIGWO4
Finally, in Eq. (5), chaotic maps are used instead of the C_{3} value used to improve the hunting skills of the δ wolf. The updated version of the equation is given in Eq. (20). (20)${D}_{\delta}=\left(2\times {z}_{m+1}\right){X}_{\delta}X\left(t\right)$
The flowchart of CNIGWO2, CNIGWO3, and CNIGWO4 algorithms are shown in Fig. 5.
Proposed chaotic intelligent optimizationbased model for depression detection
In this study, depression detection is handled as a search problem. The chaotic NIGWO algorithm is proposed to solve this problem. The KNN classifier and feature selection were used as components of the fitness function. An overview of the proposed chaosintegrated optimization model for the problem of depression detection is demonstrated in Fig. 6. The initial data set is separated as a train and test set for the models. Then, necessary text preprocessing and feature engineering are performed. NIGWO and four proposed chaotic NIGWO algorithms are adapted as new models for the interested problem. Accuracy from KNN and the number of features are simultaneously optimized for the classification model metrics. In the model created with the document matrix and optimization algorithms, the data allocated for testing was then classified and the performance of the model was measured. Furthermore, different types of popular supervised machine learning algorithms are also used for performance comparison of the proposed models. Accuracy, Fmeasure, MCC, sensitivity, and precision metrics are used to assess the success of proposed optimizationbased models and machine learning algorithms.
Preprocessing of the depression dataset
In this study, the dataset created by labeling the data from Reddit (Daru, 2023) as “depressive” or “nondepressive” was used. This dataset consists total of 4,737 data, of which 3,937 are depressed and 900 are “nondepressed”. This dataset is divided into two separate datasets, 80% training and 20% testing. A section from this dataset is shown in Fig. 7.
The dataset used in this study was preprocessed and a document matrix was obtained. The number of words extracted from the dataset because of preprocessing was 1,117. In the document matrix, the rows represent the comments, while the columns correspond to the words in all comments. If the word is mentioned in the comment on the relevant line, it takes the value 1, if not, it takes the value 0. A section of the document matrix is shown in Fig. 8.
Representation type (encoding)
The position vector of gray wolves for the focused depression detection problem is represented as follows: ${X}_{N,M}=\left[\begin{array}{cccc}{X}_{1,1}\hfill & {X}_{1,2}\hfill & \cdots \hfill & {X}_{1,M}\hfill \\ {X}_{2,1}\hfill & {X}_{2,2}\hfill & \cdots \hfill & {X}_{2,M}\hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {X}_{N,1}\hfill & {X}_{N,2}\hfill & \cdots \hfill & {X}_{N,M}\hfill \end{array}\right]$
In this encoding type, N is the number of wolves and M is used for representing the number of features extracted from the depression dataset. In other words, M represents the number of words extracted from the dataset, and N represents the number of population.
Fitness function
While calculating the fitness of each individual in the population, the accuracy value obtained from the KNN classifier was used. In addition, a feature selection process is performed so that only the necessary features are included in the model. Thus, a multiobjective approach is used. The fitness function consisting of the combination of these two objectives is calculated as shown in Eq. (21). (21)$fitness={w}_{1}\times AccuracyObtainedfromKNN+{w}_{2}\times FeaturesRatio$
Here, AccuracyObtainedfromKNN gives the accuracy rate obtained from the KNN classifier. FeaturesRatio gives the ratio of the number of features in the model to the total number of extracted features in the dataset. w_{1} and w_{2} represent the weights and their sum is equal to 1. In this study, the w_{1} value was taken as 0.9 and the w_{2} value as 0.1. Due to the direct usage of the selected features for classification, the proposed models seem to have explainable capability. While creating the model, it is aimed to include only the necessary features in the model with feature selection. In the fit function, the number of features to be included in the model is kept low with FeaturesRatio. It is known how many features are included in the proposed model and which of these features are. In this way, the model has the characteristics of being explainable, interpretable, and comparable.
Usability of the proposed method in live tweets
The proposed model in this article can be easily adapted to live or realtime tweets. After the live Tweets are preprocessed, they can be classified using the constructed model. However, realtime Tweet data are very large and in this manner, a Hadoopbased framework that allows the user to acquire and store tweets in a distributed environment and process them for detecting sarcastic content in realtime using the MapReduce programming model can be used. The mapper class can work as a partitioner and divides a large volume of tweets into small chunks and distributes them among the nodes in the Hadoop cluster. The reducer class can work as a combiner and can be responsible for collecting processed tweets from each node in the cluster and assembling them to produce the final output. Furthermore, due to the unknown class label of the Tweets, the performance of this system can be checked using other online tools.
Experimental Results
In the study, the results were obtained using MATLAB 2021b environment. Accuracy, precision, sensitivity, Fmeasure, and MCC criteria were used to evaluate the results obtained. First, the proposed CNIGWO1, CNIGWO2, CNIGWO3, and CNIGWO4 algorithms were evaluated according to the chaotic maps used in them. Because of running each algorithm 20 times, the best result, mean result, worst result, and the number of features obtained from the best result are given. In the algorithms, the population number is taken as 30 and the number of iterations as 100.
Table 2 shows the results of running the CNIGWO1 algorithm 20 times, the best result, the worst result, the mean result, and the number of features obtained from the best result. These results show that the use of chaotic number sequences with spread spectrum properties, irregularity, stochastic properties, and related ergodic properties has improved the optimizationbased model. Accordingly, the best results were obtained from the algorithms in which the initial population was created using logistic and iterative chaotic maps. According to the calculated mean results, it is seen that the best mean result is obtained from the algorithm using the circle map. The NIGWO algorithm gave the worst result. Considering the number of features obtained in the best run, it is seen that the best result is obtained from the algorithm using the circle map with 526.
Best  Mean  Worst  Number of Features from Best Solution  

CNIGWO1 with circle map  0.9989  0.9987  0.9979  526 
CNIGWO1 with logistic map  1  0.9980  0.9968  553 
CNIGWO1 with iterative map  1  0.9983  0.9968  604 
NIGWO  0.9926  0.9909  0.9884  999 
The results obtained by using circle, logistic, and iterative chaotic maps for the CNIGWO2 algorithm are shown in Table 3. It can be seen that 100% accuracy is obtained using these three chaotic maps. It is clear that the best results are obtained from the integrated chaotic maps methods. Spread spectrum properties, stochastic properties, irregularity, and related ergodic properties of these maps also seem to be effective in obtaining these promising results. Based on the listed mean values, the best mean results are obtained from the circle and iterative chaotic maps. Concerning the number of features obtained from these algorithms, it is seen that the least number of features is taken from the version using the circle map.
Best  Mean  Worst  Number of Features from Best Solution  

CNIGWO1 with circle map  1  0.9989  0.9968  673 
CNIGWO1 with logistic map  1  0.9994  0.9989  699 
CNIGWO1 with iterative map  1  0.9998  0.9989  698 
NIGWO  0.9926  0.9909  0.9884  999 
Table 4 provides the outcomes of the circle, logistic, and iterative chaotic maps integrated with the CNIGWO3 algorithms. When this table is examined, 100% accuracy was obtained from the test data by integrating logistic and iterative chaotic maps into NIGWO. Based on the mean values presented, the iterative chaotic map produces the best mean results. In terms of the number of features retrieved from them, the version employing the logistic map has the fewest.
Best  Mean  Worst  Number of Features from Best Solution  

CNIGWO1 with circle map  0.9989  0.9985  0.9979  680 
CNIGWO1 with logistic map  1  0.9983  0.9979  611 
CNIGWO1 with iterative map  1  0.9996  0.9989  665 
NIGWO  0.9926  0.9909  0.9884  999 
The results of the chaotic maps integrated CNIGWO4 algorithms are summarized in Table 5. Examining these data reveals that incorporating all three chaotic maps into the NIGWO achieves 100% accuracy. The circle chaotic map yields the best mean results based on the supplied mean values. In terms of the number of features retrieved, it is evident that the circle map version has the fewest.
Best  Mean  Worst  Number of Features from Best Solution  

CNIGWO1 with circle map  1  0.9994  0.9979  594 
CNIGWO1 with logistic map  1  0.9983  0.9968  709 
CNIGWO1 with iterative map  1  0.9992  0.9979  641 
NIGWO  0.9926  0.9909  0.9884  999 
The proposed models are compared with popular supervised machine learning algorithms such as bagging, naive bayes, support vector machine, boosting (decision stump), random forest, and nearest neighbour classifier. According to the result presented in Aleem et al. (2022), these machine learning algorithms are the most widely used and successful methods in depression detection problems. In addition to machine learning algorithms, current metaheuristic algorithms Dandelion Optimizer (DO) (Zhao et al., 2022), Artificial Bee Colony Algorithm (ABC) (Karaboga, 2010), Harris Hawks Optimization (HHO) (Heidari et al., 2019), Sine Cosine Algorithm (SCA) (Mirjalili, 2016), Firefly Algorithm (FA) (Yang, 2010), and Bat Algorithm (BA) (Yang, 2011) comparison was also made. These metaheuristic algorithms used in the comparison were run 20 times, and the values of the best results obtained as a result of these 20 runs are presented in the figure. Confusion matrices obtained from the models based on the test depression data set are depicted in Fig. 9. In each confusion matrix, “0” represents “nondepressive” classes, and “1” represents “depressive” classes. Values in these confusion matrices show that; while worse results are yielded from supervised machine learning algorithms, better results are achieved from the proposed methods in this unbalanced data set. While the metaheuristic algorithms used in the comparison give better results than machine learning algorithms, it is seen that they give worse results than the proposed CNIGWO algorithms.
Performance metric values computed according to the confusion matrices from all algorithms are presented in Table 6. CNIGWO1 and CNIGWO3 with logistic and iterative maps and CNIGWO2 and CNIGWO4 algorithms with all chaotic maps achieved the best accuracy, precision, sensitivity, Fmeasure, and MCC metrics with a value of 1. CNIGWO1 and CNIGWO3 with Circle Map seem to be the secondbest methods in terms of accuracy, sensitivity, Fmeasure, and MCC metrics. Considering the results, it can be concluded that the proposed optimizationbased models performed better than the different types of popular machine learning algorithms with respect to all evaluation metrics. The reason for the high performance of the proposed and adapted optimization algorithms is the distributed populationbased search capability. Compared to other metaheuristic algorithms used in the comparison, it is seen that the proposed CNIGWO algorithms give better results. Generally speaking, it seems that successful results are obtained in the use of metaheuristic algorithms in detecting depression.
Accuracy  Precision  Sensitivity  FMeasure  MCC  

Bagging  0.8976  0.8920  0.8980  0.8900  0.6210 
Naive Bayes  0.7561  0.8550  0.7560  0.7820  0.4560 
Support Vector Machine  0.8268  0.8570  0.8270  0.7550  0.1840 
Boosting (Decision Stump)  0.8511  0.8740  0.8510  0.8050  0.3850 
Random Forest  0.8912  0.9000  0.8910  0.8720  0.5900 
Nearest Neighbour Classifier  0.8479  0.8320  0.8480  0.8360  0.4220 
DO  0.9588  0.9291  0.8187  0.8704  0.8485 
ABC  0.9599  0.9178  0.8375  0.8758  0.8532 
HHO  0.9599  0.9122  0.8438  0.8766  0.8536 
SCA  0.9578  0.8409  0.9250  0.8810  0.8568 
FA  0.9588  0.8954  0.8562  0.8754  0.8511 
BA  0.9599  0.8315  0.9563  0.8895  0.8683 
NIGWO  0.9926  0.9940  0.9938  0.9742  0.9740 
CNIGWO1 with circle map  0.9989  0.9938  1  0.9962  0.9962 
CNIGWO1 with logistic map  1  1  1  1  1 
CNIGWO1 with iterative map  1  1  1  1  1 
CNIGWO2 with circle map  1  1  1  1  1 
CNIGWO2 with logistic map  1  1  1  1  1 
CNIGWO2 with iterative map  1  1  1  1  1 
CNIGWO3 with circle map  0.9989  0.9938  1  0.9962  0.9962 
CNIGWO3 with logistic map  1  1  1  1  1 
CNIGWO3 with iterative map  1  1  1  1  1 
CNIGWO4 with circle map  1  1  1  1  1 
CNIGWO4 with logistic map  1  1  1  1  1 
CNIGWO4 with iterative map  1  1  1  1  1 
Box plots of the results obtained in depression detection of DO, ABC, HHO, SCA, FA, and BA, which were used to compare the results with the CNIGWO algorithms proposed in this study, are shown in Fig. 10. The graph shows the lower quartile, upper quartile, and median values obtained by running each algorithm 20 times. When the figure is examined, the lower quartile, upper quartile, and median values are close to each other in the proposed CNIGWO algorithms. This shows that the algorithm performs well in each run. In addition, when compared to other algorithms, it is seen that the algorithms proposed in this study give better results than other algorithms.
Finally, the experimental results were evaluated with the nonparametric Friedman test to examine whether there was a statistically significant difference between the results obtained from the metaheuristic algorithms used in this study. The Friedman test is used to identify differences in the behavior of multiple algorithms. Here, the sample size shows how many results are obtained from each algorithm, and the median value shows the average value of these results. There are two different hypotheses in this test: null hypothesis (H0) and alternative hypothesis (H1). In the experiments, H0 means “There is no significant difference between the fitness function values of the compared algorithms and the fitness function values of the proposed CNIGWO”. H1 means “There is a significant difference between the fitness function values of the compared algorithms and the fitness function values of the proposed CNIGWO”. The results of the analysis performed with the Friedman test are shown in Table 7. The alpha value was determined as 0.05 and degrees of freedom (Df) (number of samples compared 1) was determined as 7. According to the alpha value and degrees of freedom, it can be seen from the chisquare distribution table that the ${x}_{F}^{2}$ value is 14.067. According to this table, it can be seen that the x^{2} value is greater than the ${x}_{F}^{2}$ value (x^{2}>${x}_{F}^{2}$). Accordingly, H0 is rejected and H1 is accepted. In other words, there is a statistically significant difference between the fitness function values of the compared algorithms and the fitness function values of the proposed CNIGWO according to the Friedman test.
CNIGWO  NIGWO  DO  ABC  HHO  SCA  FA  BA  

Sample size  20  20  20  20  20  20  20  20 
Median  0.9989  0.9984  0.7936  0.7957  0.8384  0.7529  0.8384  0.7677 
Sum of ranks  154  146  73.5  88  74  41  76  67.5 
Mean of the ranks  7.7  7.3  3.675  4.4  3.7  2.05  3.8  3.375 
χ^{2}score  90.832836  
Degrees of freedom (statistics)  7  
pvalue  6.66E−16 
Notes:
The result is very siginificant at p < 0.05.
In general, looking at the results, it can be seen that the proposed CNIGWO algorithms perform better than both machine learning algorithms and other metaheuristic algorithms in detecting depression. Looking at the Confusion matrices in Fig. 9, it can be seen that the proposed algorithms have a very high number of correct labeling of the data, even though it is an unbalanced dataset. Similarly, looking at Table 6, it is seen that the proposed CNIGWO algorithms achieved a 100% success rate in accuracy, precision, sensitivity, Fmeasure, and MCC metrics. While other metaheuristic algorithms perform well around 95% in the Accuracy metric, this success rate is around 80% in other metrics. Machine learning algorithms gave worse results. Finally, a nonparametric Friedman test was performed to see whether the proposed algorithm creates a semantic difference from other metaheuristic algorithms used in comparison. Looking at the test results, it can be seen that the proposed algorithm has a statistically significant difference. Based on these, it seems that metaheuristic algorithms achieve successful results in detecting depression. It has been observed that this success rate increases by integrating chaotic maps.
Conclusions
Depression is a psychological impact of modernday living on people’s minds. Identification, prevention, and treatment of depression as well as the determination of relapse risk factors are critical because depression is a serious individual and social health problem due to the risk of suicide and loss of workforce, as well as high chronicity, recurrence rates and prevalence. In this study, new chaosintegrated optimization algorithms are proposed for automatic depression detection. The topic of detecting depression in online social media and networks is approached as an optimization problem for the first time. The problem search area involves social media and network data, and novel chaosintegrated improved optimization methods are applied as solution search strategies and models for automatic depression detection. Besides, in this study, a new version of the IGWO algorithm with chaotic maps integrated is proposed. To increase the success rate, an appropriate encoding type and flexible multiobjective fitness function that can simultaneously optimize accuracy and the number of features is suggested.
Based on the results, it can be concluded that the proposed optimizationbased models outperformed popular machine learning algorithms across accuracy, precision, sensitivity, Fmeasure, and MCC evaluation metrics. In the proposed models, 100% success was achieved in accuracy, precision, sensitivity, Fmeasure, and MCC values. In other current optimization algorithms used for comparison, it is seen that more than 90% success is achieved for the accuracy value. In machine learning algorithms, it is seen that it has a success rate of 75% to 89%. The distributed multipointbased search capacity plays a part in the outstanding performance of the proposed optimization methods. The expression multipointbased search refers to a search process starting from multiple points. This method helps to obtain optimal metric values in a social media dataset where the data is unbalanced, without increasing the computational complexity. Furthermore, although the used social media data are imbalanced, optimum metric values are reached without any additional process that increases the computational complexity. Spread spectrum properties, stochastic properties, irregularity, and ergodic properties of the chaotic maps also seem to be effective in obtaining these promising results. In addition, according to confusion matrices, it can be seen that supervised machine learning algorithms performed worse distributions, while proposed chaotic optimizationbased models performed better. Because they consist of reduced features that directly represent the related words themselves; constructed depression detection models have explainable, comprehensible, and interpretable capabilities that differ from traditional blackbox methods. One of the limitations of the study is the limited number of data in the study. Another limitation is the absence of specialist doctors in the study. In future studies, it is aimed to work with more multicenter data and more experts in the field.
For future studies, adaptive and hybrid versions of the intelligent optimization algorithms can be proposed for better performance in automatic depression detection. The proposed explainable multiobjective metaheuristicbased models can also be used for other social media and network analysis problems. Distributed programming paradigms can be integrated to use this model in big social streaming online data. New different objectives can be added to the current ones, and Paretobased multiobjective methods can be proposed for simultaneously optimizing conflicting and contradicting criteria.