PeerJ Computer Science:Social Computinghttps://peerj.com/articles/index.atom?journal=cs&subject=11300Social Computing articles published in PeerJ Computer ScienceSpecial issue on analysis and mining of social media datahttps://peerj.com/articles/cs-19092024-02-292024-02-29Arkaitz ZubiagaPaolo Rosso
This Editorial introduces the PeerJ Computer Science Special Issue on Analysis and Mining of Social Media Data. The special issue called for submissions with a primary focus on the use of social media data, for a variety of fields including natural language processing, computational social science, data mining, information retrieval and recommender systems. Of the 48 abstract submissions that were deemed within the scope of the special issue and were invited to submit a full article, 17 were ultimately accepted. These included a diverse set of articles covering, inter alia, sentiment analysis, detection and mitigation of online harms, analytical studies focused on societal issues and analysis of images surrounding news. The articles primarily use Twitter, Facebook and Reddit as data sources; English, Arabic, Italian, Russian, Indonesian and Javanese as languages; and over a third of the articles revolve around COVID-19 as the main topic of study. This article discusses the motivation for launching such a special issue and provides an overview of the articles published in the issue.
This Editorial introduces the PeerJ Computer Science Special Issue on Analysis and Mining of Social Media Data. The special issue called for submissions with a primary focus on the use of social media data, for a variety of fields including natural language processing, computational social science, data mining, information retrieval and recommender systems. Of the 48 abstract submissions that were deemed within the scope of the special issue and were invited to submit a full article, 17 were ultimately accepted. These included a diverse set of articles covering, inter alia, sentiment analysis, detection and mitigation of online harms, analytical studies focused on societal issues and analysis of images surrounding news. The articles primarily use Twitter, Facebook and Reddit as data sources; English, Arabic, Italian, Russian, Indonesian and Javanese as languages; and over a third of the articles revolve around COVID-19 as the main topic of study. This article discusses the motivation for launching such a special issue and provides an overview of the articles published in the issue.Multi-grained alignment method based on stable topics in cross-social networkshttps://peerj.com/articles/cs-18922024-02-282024-02-28Jing LuQikai Gai
The user alignment of cross-social networks is divided into user and group alignments, respectively. Obtaining users’ full features is difficult due to social network privacy protection policies in user alignment mode. In contrast, the alignment accuracy is low due to the large number of edge users in the group alignment mode. To resolve this issue, First, stable topics are obtained from user-generated content (UGC) based on embedded topic jitter time, and the weight of user edges is updated by using vector distances. An improved Louvain algorithm, called Stable Topic-Louvain (ST-L), is designed to accomplish multi-level community detection without predetermined tags. It aims to obtain fuzzy topic features of the community and finalize the community alignment across social networks. Furthermore, iterative alignment is executed from coarse-grained communities to fine-grained sub-communities until user-level alignment occurs. The process can be terminated at any layer to achieve multi-granularity alignment, which resolves the low accuracy issue of edge user alignment at a single granularity and improves the accuracy of user alignment. The effectiveness of the proposed method is shown by implementing real datasets.
The user alignment of cross-social networks is divided into user and group alignments, respectively. Obtaining users’ full features is difficult due to social network privacy protection policies in user alignment mode. In contrast, the alignment accuracy is low due to the large number of edge users in the group alignment mode. To resolve this issue, First, stable topics are obtained from user-generated content (UGC) based on embedded topic jitter time, and the weight of user edges is updated by using vector distances. An improved Louvain algorithm, called Stable Topic-Louvain (ST-L), is designed to accomplish multi-level community detection without predetermined tags. It aims to obtain fuzzy topic features of the community and finalize the community alignment across social networks. Furthermore, iterative alignment is executed from coarse-grained communities to fine-grained sub-communities until user-level alignment occurs. The process can be terminated at any layer to achieve multi-granularity alignment, which resolves the low accuracy issue of edge user alignment at a single granularity and improves the accuracy of user alignment. The effectiveness of the proposed method is shown by implementing real datasets.A machine learning-based hybrid recommender framework for smart medical systemshttps://peerj.com/articles/cs-18802024-02-202024-02-20Jianhua WeiHonglin YanXiaoli ShaoLili ZhaoLin HanPeng YanShengyu Wang
This article presents a hybrid recommender framework for smart medical systems by introducing two methods to improve service level evaluations and doctor recommendations for patients. The first method uses big data techniques and deep learning algorithms to develop a registration review system in medical institutions. This system outperforms conventional evaluation methods, thus achieving higher accuracy. The second method implements the term frequency and inverse document frequency (TF-IDF) algorithm to construct a model based on the patient’s symptom vector space, incorporating score weighting, modified cosine similarity, and K-means clustering. Then, the alternating least squares (ALS) matrix decomposition and user collaborative filtering algorithm are applied to calculate patients’ predicted scores for doctors and recommend top-performing doctors. Experimental results show significant improvements in metrics called precision and recall rates compared to conventional methods, making the proposed approach a practical solution for department triage and doctor recommendation in medical appointment platforms.
This article presents a hybrid recommender framework for smart medical systems by introducing two methods to improve service level evaluations and doctor recommendations for patients. The first method uses big data techniques and deep learning algorithms to develop a registration review system in medical institutions. This system outperforms conventional evaluation methods, thus achieving higher accuracy. The second method implements the term frequency and inverse document frequency (TF-IDF) algorithm to construct a model based on the patient’s symptom vector space, incorporating score weighting, modified cosine similarity, and K-means clustering. Then, the alternating least squares (ALS) matrix decomposition and user collaborative filtering algorithm are applied to calculate patients’ predicted scores for doctors and recommend top-performing doctors. Experimental results show significant improvements in metrics called precision and recall rates compared to conventional methods, making the proposed approach a practical solution for department triage and doctor recommendation in medical appointment platforms.Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced languagehttps://peerj.com/articles/cs-17042024-01-312024-01-31Shahzad NazirMuhammad AsifMariam RehmanShahbaz Ahmad
In text applications, pre-processing is deemed as a significant parameter to enhance the outcomes of natural language processing (NLP) chores. Text normalization and tokenization are two pivotal procedures of text pre-processing that cannot be overstated. Text normalization refers to transforming raw text into scriptural standardized text, while word tokenization splits the text into tokens or words. Well defined normalization and tokenization approaches exist for most spoken languages in world. However, the world’s 10th most widely spoken language has been overlooked by the research community. This research presents improved text normalization and tokenization techniques for the Urdu language. For Urdu text normalization, multiple regular expressions and rules are proposed, including removing diuretics, normalizing single characters, separating digits, etc. While for word tokenization, core features are defined and extracted against each character of text. Machine learning model is considered with specified handcrafted rules to predict the space and to tokenize the text. This experiment is performed, while creating the largest human-annotated dataset composed in Urdu script covering five different domains. The results have been evaluated using precision, recall, F-measure, and accuracy. Further, the results are compared with state-of-the-art. The normalization approach produced 20% and tokenization approach achieved 6% improvement.
In text applications, pre-processing is deemed as a significant parameter to enhance the outcomes of natural language processing (NLP) chores. Text normalization and tokenization are two pivotal procedures of text pre-processing that cannot be overstated. Text normalization refers to transforming raw text into scriptural standardized text, while word tokenization splits the text into tokens or words. Well defined normalization and tokenization approaches exist for most spoken languages in world. However, the world’s 10th most widely spoken language has been overlooked by the research community. This research presents improved text normalization and tokenization techniques for the Urdu language. For Urdu text normalization, multiple regular expressions and rules are proposed, including removing diuretics, normalizing single characters, separating digits, etc. While for word tokenization, core features are defined and extracted against each character of text. Machine learning model is considered with specified handcrafted rules to predict the space and to tokenize the text. This experiment is performed, while creating the largest human-annotated dataset composed in Urdu script covering five different domains. The results have been evaluated using precision, recall, F-measure, and accuracy. Further, the results are compared with state-of-the-art. The normalization approach produced 20% and tokenization approach achieved 6% improvement.Mining human periodic behaviors via tensor factorization and entropyhttps://peerj.com/articles/cs-18512024-01-312024-01-31Feng YiLei SuHuaiwen HeTao Xiao
Understanding human periodic behaviors is crucial in many applications. Existing research has shown the existence of periodicity in human behaviors, but has achieved limited success in leveraging location periodicity and obtaining satisfactory accuracy for oscillations in human periodic behaviors. In this article, we propose the Mobility Intention and Relative Entropy (MIRE) model to address these challenges. We employ tensor decomposition to extract mobility intentions from spatiotemporal datasets, thereby revealing hidden structures in users’ historical records. Subsequently, we utilize subsequences associated with the same mobility intention to mine human periodic behaviors. Furthermore, we introduce a novel periodicity detection algorithm based on relative entropy. Our experimental results, conducted on real-world datasets, demonstrate the effectiveness of the MIRE model in accurately uncovering human periodic behaviors. Comparative analysis further reveals that the MIRE model significantly outperforms baseline periodicity detection algorithms.
Understanding human periodic behaviors is crucial in many applications. Existing research has shown the existence of periodicity in human behaviors, but has achieved limited success in leveraging location periodicity and obtaining satisfactory accuracy for oscillations in human periodic behaviors. In this article, we propose the Mobility Intention and Relative Entropy (MIRE) model to address these challenges. We employ tensor decomposition to extract mobility intentions from spatiotemporal datasets, thereby revealing hidden structures in users’ historical records. Subsequently, we utilize subsequences associated with the same mobility intention to mine human periodic behaviors. Furthermore, we introduce a novel periodicity detection algorithm based on relative entropy. Our experimental results, conducted on real-world datasets, demonstrate the effectiveness of the MIRE model in accurately uncovering human periodic behaviors. Comparative analysis further reveals that the MIRE model significantly outperforms baseline periodicity detection algorithms.A new cooperative control solution of subway BAS: an improved fuzzy PID control algorithmhttps://peerj.com/articles/cs-17652024-01-222024-01-22Hui FangShusong YangYing ShiYang WangYue JiangChaochao SongWei Zhang
The building automation system (BAS) of a subway is the core component of monitoring and managing urban rail transit systems. For the current problems such as low control efficiency, insufficient accuracy, and poor stability of metro BAS, this article proposes a cooperative control framework based on an improved fuzzy proportional-integral-derivative (PID) algorithm. Firstly, the concept of an integrated supervisory control system (ISCS) for subways is introduced by summarizing the previously implemented engineering construction and combining it with advanced automation technology. The system’s overall design under the ISCS framework is also improved by integrating it with the fire alarm system (FAS) with the BAS as the core unit of the reliance. Then, an improved seeker optimization algorithm (ISOA) is employed to optimize the parameters of the fuzzy PID control algorithm to achieve a coordinated control of the system based on considering the time lag problem. Finally, the accuracy, efficiency, and stability of the coordinated control response of the BAS under the ISCS framework are tested experimentally. The results suggest that the proposed cooperative control solution of BAS employing the improved fuzzy PID algorithm has good control accuracy and response efficiency and can also ensure the BAS’s higher stability in the coordinated control process, which thus greatly improves the automation level of the subway and provides a safer and more reliable high-performance for the ISCS of the subway in the urban rail transportation industry.
The building automation system (BAS) of a subway is the core component of monitoring and managing urban rail transit systems. For the current problems such as low control efficiency, insufficient accuracy, and poor stability of metro BAS, this article proposes a cooperative control framework based on an improved fuzzy proportional-integral-derivative (PID) algorithm. Firstly, the concept of an integrated supervisory control system (ISCS) for subways is introduced by summarizing the previously implemented engineering construction and combining it with advanced automation technology. The system’s overall design under the ISCS framework is also improved by integrating it with the fire alarm system (FAS) with the BAS as the core unit of the reliance. Then, an improved seeker optimization algorithm (ISOA) is employed to optimize the parameters of the fuzzy PID control algorithm to achieve a coordinated control of the system based on considering the time lag problem. Finally, the accuracy, efficiency, and stability of the coordinated control response of the BAS under the ISCS framework are tested experimentally. The results suggest that the proposed cooperative control solution of BAS employing the improved fuzzy PID algorithm has good control accuracy and response efficiency and can also ensure the BAS’s higher stability in the coordinated control process, which thus greatly improves the automation level of the subway and provides a safer and more reliable high-performance for the ISCS of the subway in the urban rail transportation industry.A novel incomplete hesitant fuzzy information supplement and clustering method for large-scale group decision-makinghttps://peerj.com/articles/cs-18032024-01-162024-01-16Jingdong WangWenhui WangFanqi MengPeifang WangXuesong WangShuang WeiTong LiuShuaisong Yang
Clustering is an effective means to reduce the scaling of large-scale group decision-making (LSGDM). However, there are many problems with clustering methods, such as incomplete or ambiguous information usually provided by different decision makers. Traditional clustering methods may not be able to handle these situations effectively, resulting in incomplete decision-making information. Calculating the clustering centers may become very complex and time-consuming. Inappropriate distance weights may also lead to incorrect cluster assignments, and these problems will seriously affect the clustering results. This research provides a novel incomplete hesitant fuzzy information supplement and clustering approach for large-scale group decision-making in order to address the aforementioned difficulties. First, the approach takes into account the trust degradation and the inhibition of relationships of distrust in the process of trust propagation, and then it builds a global and local network of trust. A novel supplemental formula is provided that takes into account the decision-preference maker’s as well as the trust-neighbor’s information, allowing the decision-neighbor maker’s recommendation to be realized. Therefore, an improved distance function can be proposed to calculate the weights by combining the relative standard deviation theory and selecting the selected clustering centers by using the density peaks in order to optimize the selection of clustering centers and reduce the complexity and scaling of the decision. Finally, an example is presented to demonstrate how the proposed method can be applied. The consistency index and comparison experiments are used to evaluate if the suggested approach is effective and reliable.
Clustering is an effective means to reduce the scaling of large-scale group decision-making (LSGDM). However, there are many problems with clustering methods, such as incomplete or ambiguous information usually provided by different decision makers. Traditional clustering methods may not be able to handle these situations effectively, resulting in incomplete decision-making information. Calculating the clustering centers may become very complex and time-consuming. Inappropriate distance weights may also lead to incorrect cluster assignments, and these problems will seriously affect the clustering results. This research provides a novel incomplete hesitant fuzzy information supplement and clustering approach for large-scale group decision-making in order to address the aforementioned difficulties. First, the approach takes into account the trust degradation and the inhibition of relationships of distrust in the process of trust propagation, and then it builds a global and local network of trust. A novel supplemental formula is provided that takes into account the decision-preference maker’s as well as the trust-neighbor’s information, allowing the decision-neighbor maker’s recommendation to be realized. Therefore, an improved distance function can be proposed to calculate the weights by combining the relative standard deviation theory and selecting the selected clustering centers by using the density peaks in order to optimize the selection of clustering centers and reduce the complexity and scaling of the decision. Finally, an example is presented to demonstrate how the proposed method can be applied. The consistency index and comparison experiments are used to evaluate if the suggested approach is effective and reliable.Named entity recognition and emotional viewpoint monitoring in online news using artificial intelligencehttps://peerj.com/articles/cs-17152024-01-102024-01-10Manzi Tu
Network news is an important way for netizens to get social information. Massive news information hinders netizens to get key information. Named entity recognition technology under artificial background can realize the classification of place, date and other information in text information. This article combines named entity recognition and deep learning technology. Specifically, the proposed method introduces an automatic annotation approach for Chinese entity triggers and a Named Entity Recognition (NER) model that can achieve high accuracy with a small number of training data sets. The method jointly trains sentence and trigger vectors through a trigger-matching network, utilizing the trigger vectors as attention queries for subsequent sequence annotation models. Furthermore, the proposed method employs entity labels to effectively recognize neologisms in web news, enabling the customization of the set of sensitive words and the number of words within the set to be detected, as well as extending the web news word sentiment lexicon for sentiment observation. Experimental results demonstrate that the proposed model outperforms the traditional BiLSTM-CRF model, achieving superior performance with only a 20% proportional training data set compared to the 40% proportional training data set required by the conventional model. Moreover, the loss function curve shows that my model exhibits better accuracy and faster convergence speed than the compared model. Finally, my model achieves an average accuracy rate of 97.88% in sentiment viewpoint detection.
Network news is an important way for netizens to get social information. Massive news information hinders netizens to get key information. Named entity recognition technology under artificial background can realize the classification of place, date and other information in text information. This article combines named entity recognition and deep learning technology. Specifically, the proposed method introduces an automatic annotation approach for Chinese entity triggers and a Named Entity Recognition (NER) model that can achieve high accuracy with a small number of training data sets. The method jointly trains sentence and trigger vectors through a trigger-matching network, utilizing the trigger vectors as attention queries for subsequent sequence annotation models. Furthermore, the proposed method employs entity labels to effectively recognize neologisms in web news, enabling the customization of the set of sensitive words and the number of words within the set to be detected, as well as extending the web news word sentiment lexicon for sentiment observation. Experimental results demonstrate that the proposed model outperforms the traditional BiLSTM-CRF model, achieving superior performance with only a 20% proportional training data set compared to the 40% proportional training data set required by the conventional model. Moreover, the loss function curve shows that my model exhibits better accuracy and faster convergence speed than the compared model. Finally, my model achieves an average accuracy rate of 97.88% in sentiment viewpoint detection.Art appreciation model design based on improved PageRank and ECA-ResNeXt50 algorithmhttps://peerj.com/articles/cs-17342023-12-192023-12-19Hang YangJingyao Chen
Image sentiment analysis technology can predict, measure and understand the emotional experience of human beings through images. Aiming at the problem of extracting emotional characteristics in art appreciation, this article puts forward an innovative method. Firstly, the PageRank algorithm is enhanced using tweet content similarity and time factors; secondly, the SE-ResNet network design is used to integrate Efficient Channel Attention (ECA) with the residual network structure, and ResNeXt50 is optimized to enhance the extraction of image sentiment features. Finally, the weight coefficients of overall emotions are dynamically adjusted to select a specific emotion incorporation strategy, resulting in effective bimodal fusion. The proposed model demonstrates exceptional performance in predicting sentiment labels, with maximum classification accuracy reaching 88.20%. The accuracy improvement of 21.34% compared to the traditional deep convolutional neural networks (DCNN) model attests to the effectiveness of this study. This research enriches images and texts’ emotion feature extraction capabilities and improves the accuracy of emotion fusion classification.
Image sentiment analysis technology can predict, measure and understand the emotional experience of human beings through images. Aiming at the problem of extracting emotional characteristics in art appreciation, this article puts forward an innovative method. Firstly, the PageRank algorithm is enhanced using tweet content similarity and time factors; secondly, the SE-ResNet network design is used to integrate Efficient Channel Attention (ECA) with the residual network structure, and ResNeXt50 is optimized to enhance the extraction of image sentiment features. Finally, the weight coefficients of overall emotions are dynamically adjusted to select a specific emotion incorporation strategy, resulting in effective bimodal fusion. The proposed model demonstrates exceptional performance in predicting sentiment labels, with maximum classification accuracy reaching 88.20%. The accuracy improvement of 21.34% compared to the traditional deep convolutional neural networks (DCNN) model attests to the effectiveness of this study. This research enriches images and texts’ emotion feature extraction capabilities and improves the accuracy of emotion fusion classification.Research on the control strategies of data flow transmission paths for MPTCP-based communication networkshttps://peerj.com/articles/cs-17162023-12-062023-12-06Zhong ShuHua-Bing DuXin-Yu ZhuShi-Xin RuanXian-Ran Li
The performance of multipath transmission control protocol (MPTCP) subflow through the enhancement mechanism of the MPTCP communication is improved. When dealing with multiple MPTCP subflows occupying the same transmission path, critical issues such as selection and optimization of multipath, and efficient scheduling of available multiple tracks are effectively addressed by incorporating the technology called software defined network (SDN) that is constructed based on four key parameters, namely, network transmission bandwidth, transmission paths, path capacity, and network latency. Besides, critical equipment such as the network physical device layer and SDN controller are integrated with the four parameters. So, the network model defines the transmission control process and data information. Considering the predetermined total network bandwidth capacity to select multiple paths, the adequate bandwidth capacity is determined by defining the data transfer rate between MPTCP terminals and MPTCP servers. However, the processing latency of the OpenFlow switch and the SDN controller is excluded. The effective network transmission paths are calculated through two rounds of path selection algorithms. Moreover, according to the demand capacity of the data transmission and the supply capacity of the required occupied network resource, a supply and demand strategy is formulated by considering the bandwidth capacity of the total network and invalid network latency factors. Then, the available network transmission path from the valid network transmission path is calculated. The shortest path calculation problem, which is the calculation and sorting of the shortest path, is transformed into a clustering, Inter-Cluster Average Classification (ICA), problem. The instruction of the OpenFlow communication flow is designed to schedule MPTCP subflows. Thus, various validation objectives, including the network model, effective network latency, effective transmission paths, supply-demand strategies, ineffective transmission paths, shortest feasible paths, and communication rules are addressed by the proposed method whose reliability, stability, and data transmission performance are validated through comparative analysis with other conventional algorithms. Found that the network latency is around 20 s, the network transmission rate is approximately 10 Mbps, the network bandwidth capacity reaches around 25Mbps, the network resource utilization rate is about 75%, and the network swallowing volume is approximately 3 M/s.
The performance of multipath transmission control protocol (MPTCP) subflow through the enhancement mechanism of the MPTCP communication is improved. When dealing with multiple MPTCP subflows occupying the same transmission path, critical issues such as selection and optimization of multipath, and efficient scheduling of available multiple tracks are effectively addressed by incorporating the technology called software defined network (SDN) that is constructed based on four key parameters, namely, network transmission bandwidth, transmission paths, path capacity, and network latency. Besides, critical equipment such as the network physical device layer and SDN controller are integrated with the four parameters. So, the network model defines the transmission control process and data information. Considering the predetermined total network bandwidth capacity to select multiple paths, the adequate bandwidth capacity is determined by defining the data transfer rate between MPTCP terminals and MPTCP servers. However, the processing latency of the OpenFlow switch and the SDN controller is excluded. The effective network transmission paths are calculated through two rounds of path selection algorithms. Moreover, according to the demand capacity of the data transmission and the supply capacity of the required occupied network resource, a supply and demand strategy is formulated by considering the bandwidth capacity of the total network and invalid network latency factors. Then, the available network transmission path from the valid network transmission path is calculated. The shortest path calculation problem, which is the calculation and sorting of the shortest path, is transformed into a clustering, Inter-Cluster Average Classification (ICA), problem. The instruction of the OpenFlow communication flow is designed to schedule MPTCP subflows. Thus, various validation objectives, including the network model, effective network latency, effective transmission paths, supply-demand strategies, ineffective transmission paths, shortest feasible paths, and communication rules are addressed by the proposed method whose reliability, stability, and data transmission performance are validated through comparative analysis with other conventional algorithms. Found that the network latency is around 20 s, the network transmission rate is approximately 10 Mbps, the network bandwidth capacity reaches around 25Mbps, the network resource utilization rate is about 75%, and the network swallowing volume is approximately 3 M/s.