Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on May 8th, 2024 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on June 21st, 2024.
  • The first revision was submitted on July 3rd, 2024 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on July 14th, 2024.

Version 0.2 (accepted)

· Jul 14, 2024 · Academic Editor

Accept

The authors have addressed all of the reviewers' comments.

[# PeerJ Staff Note - this decision was reviewed and approved by Sebastian Ventura, a 'PeerJ Computer Science' Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

The revised manuscript is acceptable for publication.

Experimental design

FIne.

Validity of the findings

Fine.

Additional comments

None.

·

Basic reporting

The revisions are well done. I have no further concern. The manuscript may be published.

Experimental design

N/A

Validity of the findings

N/A

Additional comments

N/A

Version 0.1 (original submission)

· Jun 21, 2024 · Academic Editor

Major Revisions

After carefully considering the reviews from the two reviewers and conducting a thorough reading of the manuscript, it is clear that while the paper presents an innovative approach to text classification, significant improvements are needed to enhance clarity, rigor, and comprehensiveness. Below, I have outlined the necessary revisions based on the reviewers' feedback and additional points that should be addressed.

Basic Reporting
1. Clarification on Domain Context (Line 43)
o Reviewer 1 Comment: "In single-cell research" needs an introduction for readers unfamiliar with this field.
o Required Action: Briefly explain the relevance of single-cell research, mentioning its connection to biological research or RNA studies.
2. Joint Attention Framework Clarification (Line 62)
o Reviewer 1 Comment: The reference to "Attention is all you need" does not clearly explain the joint attention framework between label and text.
o Required Action: Clarify how your model's joint attention mechanism differs from or builds upon the framework described in Vaswani et al. (2017).
3. Label Types Clarification (Lines 95-96)
o Reviewer 1 Comment: Ambiguity about the types of labels.
o Required Action: Clearly define the two types of labels and their roles in the model.
4. Equation and Terminology Consistency (Below Line 121)
o Reviewer 1 Comment: Confusion regarding "BLM" and BiLSTM.
o Required Action: Ensure consistency in terminology, explicitly stating that BLM refers to BiLSTM.
5. Figure and Table Optimization
o Reviewer 1 Comment: Figures 2 and 4 are too large and could be better represented.
o Required Action: Convert Figure 2 into a table and consider starting the y-axis of Figure 4 from a non-zero value to highlight differences more effectively. Alternatively, use a table if it improves clarity.

Experimental Design
1. Dataset Details and Task Clarification
o Reviewer 1 Comment: Insufficient detail on the dataset and the classification task.
o Required Action: Provide a detailed description of the dataset, including the nature of the text (tweets, reviews, etc.) and the specific labels used (e.g., fake news detection).
2. Evaluation on Additional Datasets
o Reviewer 1 Comment: Evaluation is limited to the authors' dataset.
o Required Action: Evaluate the proposed framework on additional datasets to demonstrate its generalizability and robustness.
3. Computational Complexity of Baselines
o Reviewer 1 Comment: Mention computational complexity for a fair comparison.
o Required Action: Include a comparison of the computational complexity (number of parameters) of the non-LLMs baselines to contextualize the performance results.
4. Crawling and Cleaning Strategy (Line 203)
o Reviewer 2 Comment: Details on data crawling and cleaning are unclear.
o Required Action: Elaborate on the crawling strategy, spam filtering methods, and the techniques used by the Baidu open AI data cleaning process.
5. Section Organization
o Reviewer 2 Comment: The section "COORDINATE ATTENTION MECHANISM MODEL WITH IN METHOD" is unorganized.
o Required Action: Break down this section into smaller, logical subsections to improve readability and coherence.
6. Time Comparison in Table 2
o Reviewer 2 Comment: Include time comparison and hardware specifications.
o Required Action: Add columns for time comparison and specify the hardware used for testing to illustrate the efficiency of the proposed algorithm.
7. Ablation Study Expansion
o Reviewer 2 Comment: The ablation study is too brief.
o Required Action: Expand the ablation study to include 8-10 test cases, providing a more thorough analysis of the model's components.

Validity of the Findings
1. Detailed Dataset Description
o Reviewer 1 Comment: Insufficient detail to replicate results.
o Required Action: Include detailed descriptions such as the length of input texts, preprocessing steps, and any specific data transformations applied.

Additional Comments
• Ensure all acronyms and technical terms are defined upon their first occurrence.
• Improve the overall language and clarity of the manuscript to make it more accessible to a broader audience.
• Consider adding a discussion on potential limitations and future work to provide a balanced view of the study's contributions and areas for further investigation.

Conclusion
The proposed paper has the potential to make a significant contribution to the field of text classification. However, major revisions are required to address the concerns raised by the reviewers and to ensure that the study's findings are presented clearly and comprehensively. The authors are encouraged to thoroughly revise the manuscript and resubmit it for further review.

Reviewer 1 ·

Basic reporting

The paper presents a novel framework aimed at enhancing text classification performance. The authors propose an innovative approach that leverages both text and label information effectively. They integrate several state-of-the-art methods to achieve this goal, demonstrating improvements over existing techniques. The framework is well-constructed, and the experimental results support the efficacy of the proposed method.
A. The paper clearly explains the motivation behind the proposed method and outlines the methodology effectively. However, there are several aspects that would benefit from further clarification or additional detail:
1. Line 43: The phrase "In single-cell research" refers to a non-NLP topic. It would be beneficial to provide a brief introduction to the field, such as mentioning that it pertains to biological research or RNA studies, to ensure that readers who are unfamiliar with this area can understand the context.
2. Line 62: “Rather than employing a conventional self-attention mechanism, the study referenced in (Vaswani et al., 2017) implements a joint attention framework between label and text representations.” I read the paper “Attention is all you need” and do not see they implement that such a framework between label and text, could you clarify this one?
3. Line 95, 96: “The label are associated with the note text denoting z_text , and the label with the note text denoting z_label” – So, there are 2 types of labels?
4. Equation below line 121: Does BLM refer to BiLSTM?
B. The figures are too large (e.g., Figure 2, Figure 4). Figure 2 can be turned into a table, Figure 4 does not show clearly the difference in performance, a table could be a better choice, or you could make the y-axis starts from a non-zero values (say 80).

Experimental design

A. Dataset: The authors constructed the dataset themselves and showed some statistics in the paper (number of instances each label). However, more details should be added to clarify what they are trying to do. All I can guess is given a text (is it a tweet or something), the task is to predict whether the text belongs to label A/B (but I have no idea what kind of the labels in the dataset, maybe fake news detection?)
B. The research question is well defined, the number of baselines is sufficiently good. But the authors only evaluated the baselines on their dataset, did not evaluate their framework on other datasets to have a comprehensive and fair comparison.
C. Their evaluation results demonstrate the good potential of the work. They outperform non-LLMs baselines and are only behind the strongest LLM-based framework which is ERNIE.
- They also argue that the pretrained LLMs baselines are more extensively computational, and that is why the LLMs can have better performance. Therefore, the authors should have also mentioned the computational complexity (number of parameters) of the non-LLMs baselines to have a fair comparison.

Validity of the findings

The dataset should be described in more detail. To replicate the results, more information is required. For examples, length of input text, etc.

Additional comments

No comment.

·

Basic reporting

The work attempts to classify online comments regarding commits using a proposed natural language processing technique and compared them with standard techniques like BERT, XLNet, etc. I have following comments regarding the paper:

1. The paper mentions "crawling COVID online reviews" in line 203. The crawling strategy is unclear. Mention details of how the data was crawled. Also, this kind of crawling would fetch too many spam texts, how have they been filtered? Further, it was mentioned that Baidu open AI data cleaning was used. But the technique behind that is unclear.

2. The "COORDINATE ATTENTION MECHANISM MODEL WITH IN METHOD" section is too unorganized and difficult to follow. Just putting two groups saying "overall plan" is not enough. The entire section needs to be divided into smaller and logical subsections so the paper would be easier to follow.

3. Table 2 could include time comparison. This helps us to understand how optimized the algorithm is and whether or not we require a new algorithm who's performances are unclear. Also, if time comparisons are given, then it is also logical to give the hardware specification of the test environment and processing capabilities.

4. Ablation study is too little, it should be thoroughly detailed with 8-10 or more test cases.

Experimental design

See above

Validity of the findings

See above

Additional comments

See above

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.