An efficient approach to identifying anti-government sentiment on Twitter during Michigan protests

View article
PeerJ Computer Science

Main article text

 

Introduction & motivation

Data preparation

Data collection

  • Operation Gridlock: This was the name given to the first protest. It was organized by a Facebook group with the same name, created by the Michigan Freedom Fund and Michigan Conservative Coalition. Close to 3,000 people showed up, the protest lasted 8 h, and the protesters blocked ambulances from reaching the only Level I trauma center at Sparrow Hospital. Most stayed in their cars, jammed the streets around the capitol building, and caused delays during a shift change at the hospital. About 150 protesters spilled on the lawn of the Capitol, flouting social distancing and masking guidelines. Protesters carried confederate, Nazi, and American flags (Berg & Egan, 2020).

  • Michigan Protest: During the second Michigan protest, hundreds of protesters carried firearms, dressed in camouflage and military garb, gathered at the Capitol, and many managed to enter the building. Thus, the second protest took a more violent tone. It was organized by the conservative group American Patriot Council. Confederate flags, swastikas, and nooses were present at this protest too (Mauger, 2020).

Data annotation

  • Tactics and Circumstances: These tweets referred to the tactics employed by the protesters, and the other circumstances surrounding the protests. Although these tactics were disruptive and even violent; naturally, anti-government tweets praised them for the inconvenience they caused and the threatening/intimidation situations they produced. On the other hand, non anti-government tweets condemned them for the same reasons.

  • Local Politics: These tweets mentioned local political figures in Michigan, with Governor Whitmer appearing predominantly. DeVos was another highly visible Republican family in Michigan with a significant presence and was believed to have sponsored the protests (Hernandez, 2020). Anti-government tweets denigrated the governor as a dictator and a Nazi, whereas non anti-government tweets stood with her in solidarity.

  • Non-local Politics: These tweets cast the protests in Michigan as a part of the broader landscape and encouraged people in other states and nationally to engage in similar resistance and rallies to ease COVID-19 restrictions. Nationally visible Republican and Democratic leaders and governors of other states were mentioned in these tweets.

  • COVID-19: These tweets explicitly referred to COVID-19. Anti-government tweets questioned the motive behind the public health measures and expressed skepticism about the seriousness of the virus. Non anti-government tweets mostly voiced concern about how these protests, which also came with rebelling against the public health guidelines such as social distancing and masks, would affect the trajectory of the number of cases.

  • Political Ideology: These tweets were ideologically inspired; anti-government tweets praised the protesters as patriots and defenders of individual liberties and freedoms, while non anti-government tweets were critical of the protesters as white supremacists and racists.

Feature computation

Text features

Interaction features

Authors’ features

  • Network strength: The strength of the authors’ network can be assessed by the numbers of friends and followers. Generally, tweets of those authors who have larger networks of friends and followers can expect greater interaction and visibility. The numbers of friends and followers are compared both for authors of original and quoted tweets. The listed count indicates the number of other users who have added an author to their list and can be an indicator of popularity (Tweettabs, 2022). It is thus reasonable to believe that tweets of authors with greater listed counts will be more popular and will receive more likes and retweets. The table indicates that only two parameters, namely, the number of friends of the authors of original tweets and the number of followers of the authors of quoted tweets, are significant. The difference in the other parameters is insignificant between the two groups.

  • Activity level: One of the main indicators of the degree to which the authors are active on the platform is the number of status updates they have shared through the entire period that their accounts have been active. Status updates were compared for the authors of both the original and quoted tweets. Authors of non anti-government tweets have posted a significantly greater number of status updates compared to the authors of anti-government tweets. However, this difference is insignificant for authors of quoted tweets from both groups. Other secondary indicators include the number of times they have liked tweets from their friends and followers. Authors who prolifically react to tweets that appear on their feeds are likely to invite similar altruistically reciprocal relationships from their friends and followers (Oehmichen et al., 2019). Thus, the number of likes a tweet receives may have a high positive correlation with the number of tweets the author may have liked. However, there is no significant difference in the number of likes (listed as the number of favorites in Table 6 according to the nomenclature used by the Twitter API) by the authors of tweets from both groups.

  • Authenticity/Trust: Tweets from high-profile, celebrated authors may attract a lot more attention, probably because the general public implicitly believes that the content shared from their accounts is more trustworthy and authentic. Moreover, these authors tend to have much larger networks of followers than those who are not celebrities. These popular authors who enjoy celebrity status tend to have accounts verified by Twitter; and hence, whether a tweet is shared from a verified account can be a factor in influencing its spread. Thus, the table also compares the percentage of tweets shared from verified accounts for both classes. In the absolute sense, the percentage shared from verified accounts is trivial for both anti-government and non anti-government tweets. However, the difference between the percentage is statistically significant, as indicated by the p-value.

Classifiers and performance

  • Random Forests (RF): Random Forests is an ensemble learning method, where the underlying weak learner is a Decision Tree (Liaw & Wiener, 2002). It uses bagging to reduce variance by generating a number of decision trees with different training sets and parameters. The parameters of the model are as follows. Each forest consisted of 100 trees, the maximum number of features used to grow each tree in the forest is set to the square root of the total number of features (approximately 25–30 when all the features are employed), and each decision tree is not pruned.

  • Support Vector Machines (SVM): Support Vector Machines (SVM) is a powerful classification technique that estimates the boundary (called hyper-plane) with the maximum margin (Suykens, Lukas & Vandewalle, 2000). We used SVMs with both linear and RBF kernels and L2 regularization. The regularization parameter C was set to 1, and kernel coefficient γ was set to scale.

  • Logistic Regression (LR): One of the basic and popular algorithms to solve classification problems, this is named as such because of the Logit function that forms its basis. The parameters are penalty–L1, tolerance for the stopping criteria–0.0001, the inverse of the regularization strength C–1.00 and the maximum number of iterations–100 (Pedregosa et al., 2011).

  • Multi-Layer Perceptron (MLP): Multi-Layer Perceptron is a feed-forward Artificial Neural Network (ANN) that consisted of input, hidden, and output layers (Delashmit & Manry, 2005), set to 10, 8, 5 and 2 respectively. We used rectifier linear unit (ReLu) instead of the sigmoid activation function to handle the problem that the derivative of the activation function rapidly approaches zero. This problem with the derivative is common in deep neural networks.

  • DistilBERT (D-BERT): BERT (Bidirectional Encoder Representation from Transformers) is a deep learning model in which all outputs are connected with each input, and the weightings between them are dynamically calculated in the attention layers (Devlin et al., 2018). This characteristic allows the model to understand the context of the words based on their surrounding words as compared to directional NLP models. We employed DistilBERT, a compact version of BERT where the model has 40% fewer parameters than BERT while preserving over 95% of BERT’s performance (Sanh et al., 2019). The parameters of the DistilBERT model include: vocabulary size (30,522), max position embeddings (6), number of layers (6), number of heads (12), dimensions (768), number of hidden dimensions (3,072), dropout (0.1), attention drop out (0.1) and activation function (gelu) (Sanh et al., 2019).

  • Accuracy (A): Accuracy was defined as the percentage of tweets that are labeled correctly.

  • Precision (P): Precision measured the percentage of tweets that were actually anti-government out of all the tweets that were predicted as anti-government.

  • Recall (R): Recall measured how many of the anti-government tweets were actually labeled as anti-government.

  • F1-score (F1): F1-score balanced between Precision and Recall.

Results and Discussion

Conclusions and future research

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Hieu Nguyen conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Swapna Gokhale conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The data is available at GitHub: https://github.com/HieuQN/Michigan-Protest.

Funding

The authors received no funding for this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

5 Citations 1,686 Views 98 Downloads