Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
An adaptive method for determining the optimal number of topics in topic modeling

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on September 16th, 2024 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on December 20th, 2024.
The first revision was submitted on January 14th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on January 30th, 2025.

Version 0.2 (accepted)

Giovanni Angiulli · Jan 30, 2025 · Academic Editor

Dear Authors,

Your paper has been revised. It has been accepted for publication in PEERJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Prem Sankar C · Jan 29, 2025

Basic reporting

This session has been modified as per the previous comments.

Experimental design

This session has been modified as per the previous comments.

Validity of the findings

This session has been modified as per the previous comments.

Additional comments

This session has been modified as per the previous comments, incorporating the feedback and modification suggested earlier

Cite this review as

Sankar C P (2025) Peer Review #1 of "An adaptive method for determining the optimal number of topics in topic modeling (v0.2)". PeerJ Computer Science

Reviewer 2 · Jan 28, 2025

Basic reporting

The authors have fulfilled all points mentioned before

Experimental design

The experiments are sufficient and we'll explained

Validity of the findings

Valid

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "An adaptive method for determining the optimal number of topics in topic modeling (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Jan 14, 2025

Version 0.1 (original submission)

Giovanni Angiulli · Dec 20, 2024 · Academic Editor

Major Revisions

Dear Authors,

Your paper has been revised. Based on the reviewers' reports, major revisions are needed before it is considered for publication in PEERJ Computer Science. The issues you have to fix in your revised version of your paper are mainly the following:

1) You must provide a more explicit development of the proposed solution and motivation for the conducted research.

2) Despite your research being conducted using robust statistical methods and appropriate controls, a more detailed discussion about the conducted statistical analyses and their assumptions could strengthen the validity of the findings. Furthermore, You must explicitly assess the impact and novelty of your findings.

[# PeerJ Staff Note: The review process has identified that the English language must be improved. PeerJ can provide language editing services if you wish - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Your revision deadline is always extended while you undergo language editing. #]

Prem Sankar C · Oct 22, 2024

Basic reporting

Clarity and English: The language is generally clear and professional, but there are a few instances where the phrasing could be improved for better clarity. For example, the definition of inter-class distance could be presented more concisely and understandably in the introduction.
Literature References: Sufficient field background and context are provided, with a comprehensive review of existing topic modeling methods.
Article Structure: The article has a professional structure with clear sections and subsections.
Figures and Tables: The figures and tables are well-presented and support the text effectively.
Overall, the basic reporting is generally good, but there are a few minor areas for improvement.

Experimental design

Original Primary Research: The research presented in the paper is original and aligns with the journal's aims and scope.
Research Question: The research question is well-defined, relevant, and meaningful. The paper clearly states how the research fills an identified knowledge gap in the field of topic modeling.
Rigorous Investigation: The investigation appears to be conducted to a high technical and ethical standard. The methods are described in sufficient detail to allow for replication.

Validity of the findings

Impact and Novelty: While the paper does not explicitly assess the impact and novelty of its findings, it presents a novel method for determining the optimal number of topics in topic modeling.
Robustness, Statistical Soundness, and Control: The research appears to be conducted using robust statistical methods and appropriate controls. However, a more detailed discussion of the statistical analyses and their assumptions could strengthen the validity of the findings.
Conclusions: The conclusions are well-stated and linked to the original research question. They are limited to supporting results and avoid making excessive claims.
Overall, the validity of the findings is generally strong, but there are a few areas where additional information could be provided to enhance the credibility of the research.

Additional comments

Clarity and Conciseness: While the paper is generally well-written, there are a few instances where the language could be further clarified or condensed. This would improve readability and enhance the overall flow of the paper.
Focus: The introduction could be more focused on the specific contributions of the paper and how they address the identified research gap.
Comparison: A more explicit comparison between the different topic models and their strengths and weaknesses would be beneficial.
Provide a Table: A table summarizing the key characteristics of the different topic models (e.g., probabilistic vs. non-probabilistic, generative vs. discriminative) could help readers understand their differences more easily
Discuss the Impact of Preprocessing: Discuss the impact of different text preprocessing techniques on the performance of AICDR and the overall topic modeling results.
Discussion of Limitations: The paper could discuss potential limitations of AICDR, such as its sensitivity to the quality of the clustering results or its computational cost.
Real-World Applications: The paper could explore potential real-world applications of AICDR to highlight its practical significance.

Cite this review as

Sankar C P (2025) Peer Review #1 of "An adaptive method for determining the optimal number of topics in topic modeling (v0.1)". PeerJ Computer Science

Reviewer 2 · Oct 28, 2024

Basic reporting

The authors suggested a model for topic modelling But here are some comments.

1- The manuscript contains many typos. It needs proof reading. Also, the format of paragraphs is not (justify) which is not a formal way of writing.
2- The literature review is very long. It is more like a survey. However, the references used are not up to date. The most recent reference was from 2022

Experimental design

The results are not justified. It needs more visualization and justification of performance

Validity of the findings

There is need of a comparison between results obtained and the state of the arts results

Additional comments

The manuscript is not well organized. It needs major revision

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "An adaptive method for determining the optimal number of topics in topic modeling (v0.1)". PeerJ Computer Science

Amin Hosseiny Marani · Nov 11, 2024

Basic reporting

I would like to thank reviewers for tackling such a historic issue with topic modeling, finding optimal K. However, I think this work can benefit from one or more round of reviews to be ready for publication. Here is a list of basic concerns and questions I would like to share with authors.

-motivation of the paper is not clear. Why do we need inter-class distance clustering-based assessment approach?

- main concern: if authors want to get the optimal accuracy why not comparing K vs. accuracy/performance but defining an extra parameter? I can see analyzing the potential correlation/relationship between performance and AICDR(K) would be an interesting experiment but I have a hard time convincing myself that AICDR can replace K vs. performance.

-I don’t follow the logic and need for “classification methods-based topics models” Section and explanation. I’d rather would like to see how topic models would be used for classifications; e.g., using topic-document features.

-There are sentences that need reviews and/or are hard to assess without references. e.g., “the number of topics determines the model complexity, which affects the 14 explanatory ability of the model and the accuracy of topic mining. Some topic models, including 15 text clustering methods, may employ different embedded methods to determine the number of 16 topics, with some methods exhibiting lower adaptability.”, “obtain high19 accuracy clustering results”, or “Several benchmark text corpora are used to 22 compare AICDR with existing methods., or “However, 47 perplexity has been demonstrated that perplexity does not reflect the semantic coherence of 48 topics (Chang et al. 2009)”

-While authors cited Bertopic, they did not include Bertopic or any other Neural Network based models to compare the results with? It is not clear why Such methods were left to compare.

-Human assessment has been a crucial assessment phase of topic modeling methods e.g., Lau et al 2011 (machine reading tea leaves), or Roder et al 2014. If there is a reason not to consider that or a limitation, authors must mention so readers know there are other important assessment techniques as well.

Besides these concerns, I would like to thank authors for a clear description of the problem in the intro; it was easy to understand why authors are interested in this problem and why this is important to tackle. However, this paper still needs a clearer development of the solution and motivation.

Experimental design

Experimental design can benefit from a clearer results Section descriptions; e.g., how each assessment technique work technically. For example, it was personally difficult for me to understand how Elbow method works. My assumption would be Elbow method computes SSE for each K, but then how SSE can be computed for LDA or NMF?

Validity of the findings

The findings were defined and put in a way that was almost easy to follow. A side-by-side comparison in table was also made it easy to understand what to compare. Authors also provided the codes and documentations of their work.

Additional comments

Here are a list of additional concerns I would like to share with authors.

-Please review citations; e.g., this is a wrong citation “BERTopic (Abuzayed & Al-Khalifa 2021).”

-suggestion: contribution is written as a general view of Method Section. Authors can keep it short and only mention the contribution they are offering in this paper, rather than technical details.

-suggestion: Explaining topic models with equations seems unnecessary, especially since the contribution does not align with topic modeling methods per se.

-what is the relationship of stability and classification performance? While I can think of replicability being an issue for topic models, it is not immediately clear to the readers.

Cite this review as

Hosseiny Marani A (2025) Peer Review #3 of "An adaptive method for determining the optimal number of topics in topic modeling (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Sep 16, 2024

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History An adaptive method for determining the optimal number of topics in topic modeling

Summary

Version 0.2 (accepted)

Giovanni Angiulli · Jan 30, 2025 · Academic Editor

Prem Sankar C · Jan 29, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jan 28, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Giovanni Angiulli · Dec 20, 2024 · Academic Editor

Prem Sankar C · Oct 22, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Oct 28, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Amin Hosseiny Marani · Nov 11, 2024

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
An adaptive method for determining the optimal number of topics in topic modeling