Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Detecting spam and ham SMS messages using natural language processing and machine learning algorithms

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on April 9th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on May 22nd, 2025.
The first revision was submitted on June 26th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
A further revision was submitted on July 31st, 2025 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on August 29th, 2025.

Version 0.3 (accepted)

Giovanni Angiulli · Aug 29, 2025 · Academic Editor

Dear Author,

Your paper has been accepted for publication in PeerJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Section Editor covering this Section #]

Download Version 0.3 (PDF) Download author's response letter - submitted Jul 31, 2025

Version 0.2

Giovanni Angiulli · Jul 15, 2025 · Academic Editor

Major Revisions

Dear Author,

Your paper has been revised. It needs major revisions before being accepted for publication in PEERJ Computer Science. I recommend that you fulfill the reviewer's indications in the revised version of your paper: in particular, the reviewer commented that:

1) The Author has just specified the problem, whereas the introduction must be revised in its entirety. Moreover, the Author responded to the remarks superficially, while there is substantive work in the revision of the article.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 2 · Jul 14, 2025

Basic reporting

The author has just specified the problem whereas the introduction must be revised in its entirety.

Experimental design

The author responded to the remarks superficially while there is substantive work in the revision of the article.

Validity of the findings

The results are good but does not address a topical issue where context spam needs to be addressed.

Additional comments

The author has just specified the problem whereas the introduction must be revised in its entirety. Moreover, the author responded to the remarks superficially while there is substantive work in the revision of the article.

Annotated reviews are not available for download in order to protect the identity of reviewers who chose to remain anonymous.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Detecting spam and ham SMS messages using natural language processing and machine learning algorithms (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Jun 26, 2025

Version 0.1 (original submission)

Giovanni Angiulli · May 22, 2025 · Academic Editor

Major Revisions

Dear Authors,
Your paper has been reviewed. It needs major revisions before being accepted for publication in the PEERJ Computer Science. More precisely:

1) In the study, the authors applied CountVectorizer to transform the raw text into feature vectors of word count. The above technique, due to its drawbacks such as ignoring word meaning and context, equal weight assignment to each word, and so on, is inefficient compared with a transformer (BERT, Roberta, DistilBERT, etc). You must clarify the motivation of the above choice in the revised version of the paper.

2) The study applied an oversampling technique to address class imbalance, but spam is naturally less frequent than ham. Addressing the imbalance implies that the real-world distribution of the data is not preserved, and this is very important for realistic model performance and deployment. You should model without addressing the imbalance and evaluate the model performance on metrics such as F1-score and/or Precision-Recall AUC instead of just accuracy.

3) Several details are missing in the section experiments:
- Exact parameters for algorithms (e.g., the number of trees in Random Forest, kernel types in SVM, architecture details for MLP).
- Precise configuration for data preprocessing steps (e.g., specific stemming algorithm, stopword lists).
- Details of cross-validation procedures (e.g., stratification method, random seed, number of folds).
- Implementation details such as software versions, libraries used, or code snippets would significantly enhance reproducibility.
- Clarification on how hyperparameters were selected or tuned (grid search, default settings, etc.).

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**PeerJ Staff Note:** Your submission appears to have been at least partially authored or edited by a generative AI/large language model. When you submit your revision, please detail whether (and if so, how) AI was used in the construction of your manuscript in your response letter, AND mention it in the Acknowledgements section of your manuscript.

Temidayo Omotehinwa · May 7, 2025

Basic reporting

1. There are too many bulleted sections in the write up which suggest that those sections are likely written using AI. The author should own the reporting and it should be original.

Experimental design

This study focuses on classifying SMS messages as either legitimate (ham) or illegitimate (spam). The issue it addresses is dynamic and demands ongoing efforts to effectively manage it. However, I would like the author to consider the following points:

1. In the study, the authors applied CountVectorizer to transform the raw text into feature vectors of word count. This technique due to its drawbacks such as ignoring word meaning and context, equal weight assignment to each word and so on is not efficient. Application of transformer (BERT, Roberta, DistilBERT etc) embeddings is a more efficient and modern approach. This will factor in context in meaning, word order and relationships, handle synonyms and semantic similarity. Transformer models are context aware and their application would lead to a robust and more generalizable SMS spam classification.

2. The author applied an oversampling technique to address class imbalance but in reality spam is naturally less frequent than ham. So, addressing the imbalance implies that the real world distribution of the data is not preserved and this is very important for realistic model performance and deployment. It is advisable for the author to model without addressing the imbalance and evaluate the model performance on metrics such as F1-score and/or Precision-Recall AUC instead of just accuracy.

Validity of the findings

The findings would be different if the modern approach suggested earlier are followed. The result is not valid.

Cite this review as

Omotehinwa T (2025) Peer Review #1 of "Detecting spam and ham SMS messages using natural language processing and machine learning algorithms (v0.1)". PeerJ Computer Science

Reviewer 2 · May 10, 2025

Basic reporting

The author proposes a data-driven strategy utilising Natural Language Processing (NLP) and machine learning methods to identify both spam and legitimate SMS messages. The suggested model employs machine learning algorithms such as K-Nearest Neighbours (KNN), Decision Trees (DT), Random Forests (RF), Gradient Boosting (GB), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). MLP attained superior outcomes regarding accuracy, precision, and F1-score. KNN, RF, MLP, and SVM all yielded identical and optimal recall outcomes.

Experimental design

The author uses the public Dataset (SMS Spam Collection from Kaggle).
The preprocessing stage removes punctuation symbols and converts the text to lowercase. During the tokenization step, the SMS messages are fragmented into smaller units by eliminating stop words and frequent words that are not relevant to the meaning. Next comes the Stemming and lemmatization phases aimed at obtaining the base form of the words. The CountVectorizer tool is used to convert SMS messages into a matrix where tokens are represented by columns, SMS messages by rows, and each cell indicates the number of occurrences of a token in an SMS. The Stratified shuffle split cross-validation method is used to divide the dataset into ten random groups, making sure that each group fairly represents all classes. The author addressed the data imbalance in the dataset to train machine learning models.

Validity of the findings

In terms of accuracy, precision, and F1-score, MLP was able to surpass all other competitors. Each of the four models, KNN, RF, MLP, and SVM, produced identical and superior recall outcomes.

Additional comments

- The introduction presents the context without specifying the issue.
- Section 1.1 presents previous works to clarify the context. But the citation of previous works in this section is written incoherently.
- The research question: How can user information be secured? Has not been addressed. In other words, the authors address SMS filtering to counter phishing or malware attacks without providing examples and linking the obtained results to this motivation.
- The writing of the article needs to be improved to highlight the contribution.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Detecting spam and ham SMS messages using natural language processing and machine learning algorithms (v0.1)". PeerJ Computer Science

Reviewer 3 · May 13, 2025

Basic reporting

- This article was written well. There are no grammatical errors. Although the language is professional and mostly clear, some technical definitions may have minor ambiguity. To guarantee total clarity, formal definitions of evaluation criteria and methodological processes should be clearly given.
- The article structure seems suitable, but it should clearly declaring hypotheses and making sure all words are well-defined would help to increase its scientific rigor.

Experimental design

The article researches SMS spam detection, utilizing well-known datasets (Kaggle SMS Spam Collection) and commonly used machine learning algorithms (KNN, Decision Tree, Random Forest, etc.). While applying NLP preprocessing and machine learning models to SMS spam detection is a prevalent approach, the originality of this research should be declared and evaluated more thoroughly. It should explicitly emphasize novel contributions, such as new preprocessing techniques, innovative model architectures, or improved evaluation strategies. There are potential issues that may include: limited novelty unless the authors introduce unique methodological advancements or insights that significantly outperform prior works or address previously unresolved challenges.

The research questions—such as how to detect spam vs. ham SMS, which algorithms are most effective, and how to optimize performance—are relevant and appropriate for the problem domain. However, they are too generic... The article should provide hypotheses or objectives that identify gaps in existing literature and delineate the unique angle of this work.

This research uses some conventional machine learning assessment techniques, including data preprocessing, cross-validation, and performance criteria. A detailed discussion on how class imbalance was addressed (e.g., oversampling), hyperparameter tuning, or model selection criteria is not explicitly provided. Transparency is necessary to establish high technical standards for experiments.

There are several details that are missing in the experiments:
- Exact parameters for algorithms (e.g., the number of trees in Random Forest, kernel types in SVM, architecture details for MLP).
- Precise configuration for data preprocessing steps (e.g., specific stemming algorithm, stopword lists).
- Details of cross-validation procedures (e.g., stratification method, random seed, number of folds).
- Implementation details such as software versions, libraries used, or code snippets would significantly enhance reproducibility.
- Clarification on how hyperparameters were selected or tuned (grid search, default settings, etc.).

Validity of the findings

This research employs commonly used datasets and standard machine learning algorithms such as KNN, Decision Tree, Random Forest, SVM, MLP, and Gradient Boosting. While the results indicate that the Multi-Layer Perceptron (MLP) achieves the highest performance, there appears to be no significant methodological innovation or novel approach introduced. Consequently, the scientific and practical impact of the work may be limited if there are no unique contributions.

The authors mention using a publicly available dataset from Kaggle that is relevant and of sufficient size. However, there is no detailed explanation of how the data were processed beyond basic cleaning steps (removing punctuation, lowercase conversion, tokenization, stopword removal, stemming/lemmatization).

The conclusions summarized indicate that MLP achieved the best metrics (accuracy, precision, F1-score), and other models (KNN, RF, SVM) also performed well. However, there is a lack of discussion and critical analysis of limitations, such as potential overfitting, dataset bias, or misclassification costs; the conclusions might be overly optimistic.
There is no evident link that explicitly discusses how the results answer the initial research questions.

Additional comments

The article's results, showing MLP as the best among the selected ML models, align with existing literature emphasizing the importance of hyperparameter tuning and feature extraction. However, there are no novel methods, novel findings proposed in this article. The authors should investigate deep learning models like CNN, LSTM, or Transformers, which could perhaps enhance performance if the goal is to push further.

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "Detecting spam and ham SMS messages using natural language processing and machine learning algorithms (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Apr 9, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Detecting spam and ham SMS messages using natural language processing and machine learning algorithms

Summary

Version 0.3 (accepted)

Giovanni Angiulli · Aug 29, 2025 · Academic Editor

Version 0.2

Giovanni Angiulli · Jul 15, 2025 · Academic Editor

Reviewer 2 · Jul 14, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.1 (original submission)

Giovanni Angiulli · May 22, 2025 · Academic Editor

Temidayo Omotehinwa · May 7, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · May 10, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · May 13, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Detecting spam and ham SMS messages using natural language processing and machine learning algorithms