Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Experimental study on short-text clustering using transformer-based semantic similarity measure

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on October 18th, 2023 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on January 14th, 2024.
The first revision was submitted on April 17th, 2024 and was reviewed by 1 reviewer and the Academic Editor.
The article was Accepted by the Academic Editor on May 3rd, 2024.

Version 0.2 (accepted)

Xiangjie Kong · May 3, 2024 · Academic Editor

The authors have revised the paper accordingly. The reviewer is satisfied with this version. It can be accepted now.

Reviewer 3 · Apr 28, 2024

Basic reporting

This paper has been revised better than the last version.

Experimental design

no comment

Validity of the findings

The findings are well organized in this version.

Cite this review as

Anonymous Reviewer (2024) Peer Review #3 of "Experimental study on short-text clustering using transformer-based semantic similarity measure (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Apr 17, 2024

Version 0.1 (original submission)

Xiangjie Kong · Jan 14, 2024 · Academic Editor

Major Revisions

The work is interesting and has value. However, the reviewers also have some concerns about originality, experiments, etc. The authors need to improve the work according to the comments.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · Nov 11, 2023

Basic reporting

Clear and unambiguous professional English is used throughout the paper, despite the hard use of numerous acronyms that make the reading laborious.
Literature references are not sufficient: in particular, when dealing with literature on semantic-based measures with an external knowledge base, only ontologies and semantic networks are considered, while web-based measures are ignored. In fact, there are several lines of research on web-based semantics applied to sentence similarity and concept proximity, not only for the detection of text proximity in meaning (with sentence comparison or heuristic semantic walk in web-based concept networks like Wikipedia), but also for semantic similarity of sentences, with application like emotion recognition (with emotion vector extraction based on a model like Ekman or Plutchick, or semantic model for emotion recognition in web objects, e. g. comments on a topic in any web repository or social/communication network), the context-based semantic similarity of images using image recognition from metadata, context recognition in social networks.
Word embedding is also an approach often used for text mining and sentiment analysis or emotion/topic recognition. The real novelty of this paper lies in the use of word embedding together to be applied to deep learning classification using transformers, which is the latest state-of-the-art, with clustering.
In the absence of some state-of-the-art analysis, the comparison of results with existing approaches and methods is lacking, and it is not clear if the proposed methodology can improve the literature, and with what limits and comparison.
There is sufficient clarity and expressiveness in the article structure, figures, tables, and data.

Experimental design

Motivations for the choice of the datasets should be provided, besides being popular for general applications for topic classification. Similar more detailed motivations should be given for the choice of k-means and agglomerative clustering, with no refinement with respect to the models provided by recent research on clustering.

It is stated how research fills an identified knowledge gap, discarding web-based approaches to sentence-based semantics, thus weakening the motivations.
Methods are described with sufficient detail to replicate the approach, it would be beneficial to have access to the whole code and preprocessed data. In fact, only a few main functions are available in the GitHub repository, with no readme file or instructions for reproducing the results.

Validity of the findings

The claim that “This study, driven by the hypothesis that integrating 451 sentence embedding techniques could improve the efficacy of these clustering methods, offers significant 452 contributions.” Should be from a reviewer more than from the author.
The conclusions speak more of the goodness of the paper than the goodness of the results, which should include detailed analysis and comparison of results with the state of the art.

Cite this review as

Anonymous Reviewer (2024) Peer Review #1 of "Experimental study on short-text clustering using transformer-based semantic similarity measure (v0.1)". PeerJ Computer Science

Reviewer 2 · Jan 4, 2024

Basic reporting

The abstract provides a clear overview of the research, addressing the importance of sentence clustering and highlighting the gap in evaluating clustering performance using low-dimensional continuous representations. It effectively introduces the new implementation incorporating a sentence similarity measure based on embedding representation.

The author is encouraged to incorporate recent references to enhance the incorporation of the latest developments in the field and improve the coverage of the state-of-the-art literature.

The overall structural organization of the article is satisfactory.

The figures and tables have been appropriately presented and labeled.

It would be beneficial for the author to include information about the improvement in results or accuracy in the abstract for a more comprehensive overview.

Experimental design

no comment

Validity of the findings

The conclusion would be strengthened by including a discussion on the achieved accuracy and results, providing a more comprehensive overview. It is recommended that the authors explicitly address this aspect in the conclusion section.

Cite this review as

Anonymous Reviewer (2024) Peer Review #2 of "Experimental study on short-text clustering using transformer-based semantic similarity measure (v0.1)". PeerJ Computer Science

Reviewer 3 · Jan 7, 2024

Basic reporting

This paper is well written in English and it is clear.
The reference is sufficient.
The organization of this paper is professional.
It is also self-contained.
Most of the formal results are included.

Experimental design

By experiments, authors try to demonstates that incorporating the sentence embedding measure leads to significantly improved performance in text clustering tasks. This sounds interesting. And the experiments are described detaily and informative.

However, the experimenatal study was not supported by theoretical analysis in this paper and it is not enough to give the conclusion.

Validity of the findings

I don't think the originality in this paper is enough. The author should carefully polished the paper and try to clearly provid the main idea and the theoretical supports.

Cite this review as

Anonymous Reviewer (2024) Peer Review #3 of "Experimental study on short-text clustering using transformer-based semantic similarity measure (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Oct 18, 2023

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Experimental study on short-text clustering using transformer-based semantic similarity measure

Summary

Version 0.2 (accepted)

Xiangjie Kong · May 3, 2024 · Academic Editor

Reviewer 3 · Apr 28, 2024

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Xiangjie Kong · Jan 14, 2024 · Academic Editor

Reviewer 1 · Nov 11, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Jan 4, 2024

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Jan 7, 2024

Basic reporting

Experimental design

Validity of the findings

Review History
Experimental study on short-text clustering using transformer-based semantic similarity measure