Normalized effect size (NES): a novel feature selection model for Urdu fake news classification

View article
PeerJ Computer Science

Main article text

 

Introduction

  1. Proposal of a robust feature selection approach to discriminate between fake and real news;

  2. Comparison of the performance of the proposed feature metric with seven well-known feature selection methods, showing the high performance of our feature metric;

  3. Analysis of the performance of the proposed feature selection metric on two benchmark Urdu fake news datasets.

Literature Review

Proposed Methodology

Feature extraction

Term weighting schemes

Feature selection (FS) methods

Normalized difference measure (NDM)

  • An important term should have high |tpr − fpr| value.

  • One of the tpr or fpr values should be closer to zero.

  • If two terms have equal |tpr − fpr| values, then the term having a lower min (tpr, fpr) value should be assigned a higher rank where min is the function to find a minimum of the two values.

Bi-normal separation (BNS)

Odds ratio (OR)

Gini

Distinguished feature selector (DFS)

Information Gain (IG)

Chi square (Chi)

Propose feature selection measure: normalized effect size (NES)

  • μ+ is the mean of the TFIDF score of a term across all the documents labeled as positive

  • μ is the mean of the TFIDF score of a term across all the documents labeled as negative

  • σ+ is the standard deviation of the TFIDF score of a term across all documents labeled as positive

  • σ is the standard deviation of a TFIDF score of a term across all documents labeled as negative

Classification model

Experimental evaluation

Evaluation corpus

Results and comparison

Conclusion

Additional Information and Declarations

Competing Interests

Ivan Miguel Pires is an Academic Editor for PeerJ Computer Science.

Author Contributions

Muhammad Wasim conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Sehrish Munawar Cheema conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Ivan Miguel Pires conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code is available at GitHub and Zenodo:

- https://github.com/dr-m-wasim/UrduFakeNewsFS.

- Muhammad Wasim, Sehrish Munawar Cheema, & Ivan Miguel Pires. (2023). Normalized Effect Size (NES): a novel feature selection model for Urdu fake news classification. https://doi.org/10.5281/zenodo.8320957.

The BET Dataset is available at GitHub: https://github.com/MaazAmjad/Datasets-for-Urdu-news.

Institution: Natural Language and Text Processing Laboratory, Center for Computing Research (CIC), Instituto Politécnico Nacional (IPN), Ciudad de México (Mexico City), Mexico

Contact: Maaz Amjad (maazamjad@phystech.edu)

The UFN Dataset is available at GitHub: https://github.com/pervezbcs/Urdu-Fake-News.

Institution: Department of Humanities and Basic Sciences, MCS, National University of Sciences and Technology, Islamabad, Pakistan

Contact: Farkhanda Afzal: (farkhanda@scm.edu.pk)

Funding

This work is funded by FCT/MEC through national funds and co-funded by FEDER—PT2020 partnership agreement under the project UIDB/50008/2020. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1 Citation   Views   Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more