All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The manuscript requires some minor revisions. Authors are required to address the suggested modification in the manuscript, especially improvements in the English language.
[# PeerJ Staff Note: The Academic Editor has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title) #]
The manuscript in its second round is clear and unambiguous. I have a concern about English as it seems that the English is very simple and needs to be improved to a higher level.
The structure of the article is easy to follow, while it includes many figures that help to understand the idea of the algorithm.
The research question is wee defined and relevant. The article explains nicely the solution to the given question.
The article presents performance and validation approaches that were applied to the suggested algorithm.
The results are statistically sound.
line 375: and -> should be just and
line 261 I would use a number instead of no
The authors supposed to use mathematical styling in writing the mathematical phrases and equations. As such KI in line 299, line 310
Please revise your manuscript addressing all the comments and suggestions by reviewers.
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the response letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the response letter. Directions on how to prepare a response letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
The language of the manuscript is clear and unambiguous. The paper is structured well, including figures, tables and shared raw data.
The research questions are well defined, meaningful and relevant.
The experimental results are well stated.
In this paper, a novel classification scheme based on LDA topic modelling has been presented to classify noisy and sparse textual data. In general, the paper is structured well and contributes to the literature. However, a number of issues listed below should be taken into consideration to improve the content of the paper:
1- The language of the manuscript should be enhanced.
2- The manuscript lacks of discussing recent studies (2020-2021 papers) on LDA2vec.
3- The empirical results should include statistical validity tests.
The authors present a text classification scheme enriched with LDA-based features.
Overall, the manuscript was ambiguous at some sections (see attached PDF), however, the overall idea is clear.
The references are clearly provided, however, please re-check that the citation style is consistent (sometimes you have just numbers, e.g., (14)).
Article structure is OK, with the exception of some low-res images.
The authors evaluate their method first by exploring different hyperparameter settings that offer sufficient performance. In the second step, they compare to existing baselines. The first part is done adequately, however, I have concerns related to the second part of the evaluation.
Here, the authors compare their method to "baselines", however it remains unclear whether they actually run the experiments against the baseline methods or just picked up performances from the literature. In the latter case, please additionally clarify why the results are comparable. Furthermore, I would suggest using at least one of the following baselines other researchers are familliar with, so it is clear that your method actually works:
1.) doc2vec + LR (scikit-learn)
2.) BERT (end-to-end)
3.) BERT (sentence-bert) + LR (scikit-learn)
Comparison against these could shed additional light on whether the proposed method performs well (and when). Finally, the number of data sets could be larger, you are only exploring the reviews, however I would be willing to believe there is potential if the proper comparisons are conducted.
The findings are in alignment with the empirical evaluation as it stands -> the proposed method is superior to others. One problem I have with the claim is statistical significance claim: You claim that the results are significantly bettter. I saw no statistical tests being conducted to verify/prove this claim.
Conclusions could be longer (see attached PDF).
I've left numerous comments also in the attached PDF, which will hopefully improve the manuscript's quality.
The article is a proposal on framework called Topic2features. Generally, it has been well-written in terms of paper organization and writing style. The framework has been described in detail, has been tested with amazon dataset, and yet has been compared and evaluated with few algorithms and benchmark.
I have no major issues to accep the paper as it is.
accceptable
acceptable
Very good paper, and well written.
The paper proposed a novel framework topic2features (T2F) to deal with short and
sparse data using the topic distributions of hidden topics gathered from the dataset and converting them into features to build a supervised classifier.
It seems that the merit of the study is very interesting and it contains a novel of a new approach of representation that is based on topics instead of a bag of words.
The authors fail to give a clear idea of the main approach of this study. The approach is described a very vague and it is hard to get how the topics were used in a vector space.
The study needs to be improved dramatically with more description of the algorithm using pseudocode or more clear workflows and figures.
I will not be able to judge the content of the study without a clear understanding of the novelty of topic representations. This section should be improved.
Also, I would suggest to the author to send the article for English proof before resubmitting it.
The study shows satisfactory experiments
I will not be able to judge the content of the study without a clear understanding of the novelty of topic representations. This section should be improved.
Line 287 : might me should be "might be"
Figure 6 should be replaced with a more clear image or just replace with a regular table.
Figure 7 should be replaced with a more clear image or just replace with a regular table.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.