All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The reviewers appreciated the recent changes to the article so I recommend it for acceptance.
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]
The author addressed all the points that raised.
-
-
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
1.The author conducted an analysis of fake news detection using various machine learning algorithms. The overall content is presented clearly and is generally easy to understand.
2. However, the literature review is insufficient to support the findings. A more comprehensive review of recent and relevant research articles is necessary to validate the conclusions drawn in the study.
1. The author employed only basic machine learning algorithms for the fake news detection task. While this provides a foundational understanding of the problem, it falls short of capturing the complexity and evolving nature of fake news dissemination.
2. Given the significance and depth of the topic, a more rigorous analysis is warranted—preferably involving advanced algorithms such as ensemble methods, deep learning models (e.g., LSTM, BERT), or hybrid approaches that combine linguistic, semantic, and contextual analysis.
3. The current approach limits the potential for achieving higher accuracy and robustness in the results. Incorporating more sophisticated techniques would not only enhance the performance of the model but also provide deeper insights into the patterns and characteristics of fake news.
1. No novelty found.
2. Dialect Generalization: Limitations note challenges with Arabic dialects (MSA vs. regional variants), but no solutions are explored.
3. Contextual Features: Focuses on text alone; user engagement data (e.g., social media interactions) could improve detection, as cited in Related Work.
4. Ethical Considerations: Briefly mentions privacy/bias but lacks depth on ethical implications of automated detection in Arabic contexts.
Lack of Novelty: The comparative analysis of NB, RF, and NN is conventional; no innovative model or hybrid approach is proposed.
provide clear threats to validity
1. Model Innovation: Explore hybrid models (e.g., NB + NN) or ensemble methods to boost performance.
2. Dialect Handling: Incorporate dialect-specific preprocessing or transfer learning.
3. Multimodal Data: Integrate metadata (e.g., source credibility) or user behavior metrics.
4. Ethical Discussion: Expand on bias mitigation and privacy concerns specific to Arabic content
Authors needs to cited the references properly.
Preprocessing pipelines are addressed to Arabic linguistic complexities, while normalization step is missing.
The authors has well frame the article, but lot of experimental result are missing.
Why the author choose only three approaches such as NB, RF & NN. Minimum of five ML algorithms comparsion should be available for research article.
Novelty of the work would be improved.
Most of the papers in recent era's achieves a maximum accuracy of 93% and how the authors achieves the accuracy of NB is 98% is shown in the table 4 & table 5.
- The Introduction explains the motivation for the research well, but the Related work section is superficial, contains a lot of older works and so it is not up-to-date state-of-the-art. The description of the selected papers is unsystematic and does not mention always the results of methods achieved by the other authors, somewhere the methods used are not even mentioned, only the problem addressed. It is not clear whether the authors used the state-of-the-art analysis to select methods for their experiments, since there are more efficient methods for processing large-volume textual data in related works, than the ones chosen by the authors. Additionally, nowhere in the paper is it stated what specific neural network the authors used for the experiments.
- The manuscript follows a standard structure, but figures and tables lack detailed explanations. All tables have the same captures in the end of article. In the text of article, the captures are very similar. The only differences are in number of results:
Table 2: K-fold-1 results 75% training and 25% testing
Table 3: K-fold-2 results 75% training and 25% testing
Table 4: K-fold-3 results 75% training and 25% testing
Table 5: K-fold-4 results 75% training and 25% testing
There is no explanation what a meaning of the number of results is. Maybe it has relation to 4-fold cross-validation (25 percent for testing), but authors claim in line 307, that they used 10-fold cross validation.
- Figure 1 is informationally poor and is not useful. Moreover, it does not contain a formula which models the Naive Bayes classifier, only the Bayes theorem from which the NB model was derived. Figure 2 contains the simplest illustration of a neural network. It should be replaced by an illustration of the network that the authors used, which was not specified anywhere. Figure 3 illustrate Random Forest as a forest of copies of the same tree. But RF is powerful because it generates a set of decorrelated decision trees. Figure 4 contains only formulas, and so is something like nonsense. Those formulas must be put into paper as formulas.
- The paper does not clearly articulate how its findings contribute to the field. One clear contribution is the creation of dataset AFND. Other conntributions and the way of methods selection are questionable.
- Section “3-Research objective” is extremely short. There is only one objective, which is not innovative.
- Pleas explain the concept “simulated neural networks (SNNs)” and their difference from “artificial neural networks (ANNs)”.
- Keywords could be better selected. Two last are very similar. Often there is a problem of missing spaces in the text.
- Lines 111-114: Authors claim that “(Singhania, 2022) introduced …. And (Liu, 2020) further explored …” But year 2020 is before 2022.
- The article content is within the Aims and Scope of the journal, but the manuscript is not clearly written and is not written in professional way. It does not perform high technical standards.
- Methods are not described with sufficient detail. Please explain why you used these methods (NB, FR, NN). It is questionable that NB is the best as was claimed. It's hard to believe that methods like MLP, CNN, RNN, LSTM, GRU, BERT (RoBERTa, DistilBERT, …) T, mT5, byT5 and so on, would give worse results on a large dataset than NB. Please specify which one neural network was used for your experiments. If the simplest of neural networks (as illustrated in Figure 2) was used, then it is clear why poor results were achieved.
- Regardless of whether 4-fold cross-validation or 10-fold cross validation was used, also the mean of all experiments along with the deviation should be reported.
- The statistics in Table 1 are not correct. By summing numbers of Credible, Non credible and Undecided articles we get 440413 as total number of articles not 606912 as stated in the Table 1.
- Please explain how data representation by unigrams, bigrams and trigrams was used for preparing input to the selected neural network.
- Your description of the training methodology needs more details – lines 305-309. How exactly was divided dataset on training/validation/test set for NN? How was it divided for RF and NB? What does it mean “shuffled using random state of 42 prior to training”?
- The description of the used NN architecture should by added in a table, for example, where will be specified all layers accompanied with their parameters. Please explain, how were the parameters of NN tuned.
- In lines 341-342, there is declaration”… our evaluation revealed no false prediction at the class level …”. It would mean, that 100 % accuracy and F1-Score should be achieved, but was not. This needs to be explained.
- Conclusions are not entirely supported by results, for example “RF model displayed some sensitivity to certain dataset features” in lines 331-332. Experiments and evaluations are not performed satisfactorily.
- Discussion and Conclusion parts are shallow mainly repeating achieved results. There are mentioned two limitations, but further research is shortly described by one sentence, nothing about other possibilities of using newest methods and approaches.
- The discussion is superficial, making it difficult to determine the contributions of the paper. The creation of new large dataset from different Arabic languages - AFND is only one valuable contribution of the research presented in the paper.
No comment.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.