Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on January 29th, 2025 and was peer-reviewed by 5 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on May 9th, 2025.
  • The first revision was submitted on June 25th, 2025 and was reviewed by 3 reviewers and the Academic Editor.
  • A further revision was submitted on August 25th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on September 23rd, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on November 14th, 2025.

Version 0.4 (accepted)

· · Academic Editor

Accept

Author has addressed reviewer comments properly. Thus I recommend publication of the manuscript.

Reviewer 1 ·

Basic reporting

The comments response provided by the author has addressed all my concerns. I believe this version is ready for publication.

Experimental design

I have on more commets.

Validity of the findings

I have on more commets.

Additional comments

I have on more commets.

Version 0.3

· · Academic Editor

Major Revisions

Kindly revise the manuscript as per the reviewer suggestions and resubmit it.

Reviewer 1 ·

Basic reporting

The article is clearly structured. The English language usage is generally fluent. Figures and tables are thoughtfully designed.

Experimental design

This papere addresses the limitation of existing MRC approaches in ASTE task that fail to adequately model grammatical structures and multi-aspect-opinion correspondence relationships. The authors propose a multi-turn MRC framework, which integrates POS tagging and syntactic dependency features while incorporating prompt learning for sentiment polarity prediction. The method is supported by well-designed experiments using four benchmark datasets and comparisons with diverse baseline methods.

Validity of the findings

The method demonstrates significant superiority over existing methods across multiple datasets, particularly showing improvements in the AESC and AOPE subtasks. However, why there are missing results of some methods, such as EMC-GCN and COM-MRC in Table 2? This issue should be addressed.

Additional comments

No comments.

Version 0.2

· · Academic Editor

Minor Revisions

Kindly revise the manuscript as per the reviewer 5 suggestions and resubmit it.

Reviewer 2 ·

Basic reporting

The authors have addressed all my concerns.

Experimental design

N/A

Validity of the findings

N/A

Additional comments

N/A

Reviewer 3 ·

Basic reporting

The author resolved my issue. Suggested to accept the paper.

Experimental design

The author resolved my issue. Suggested to accept the paper.

Validity of the findings

The author resolved my issue. Suggested to accept the paper.

·

Basic reporting

As a reviewer I walk through all the supporting materials submitted by authors.
I carefully skimmed through the submitted response letter that covers comments among all the reviewers. Noticed updated related works section through recommendations by other reviewers.

Major comment here: all the mentioned novel concepts of the newly reviewed work should be self-consisted at the level of the whole manuscript. It is recommended to stick to the existing terminology covered in the manuscript and map novel works to it rather replicating titles. See comments below
Comments:
- with the revised version there is a need to explan "slot filling" and concept of slots in particular. What the slot refers to in the context of the whole manuscript? (Span?). How idea of slots covered
- cross-task alignment (which tasks to be aligned).

Next, visual quality and readability of the figures: Figures become improved with quality.
Comments would be related to more consistent narratives formation for the figure captions.
See examples in the comments section.

Next, authors highlight difference betweek PromptReader and existing BMRC.

Authors provide a decent work on bringing the obtained results to more accessible version rather in the previous manuscript. The comments are provided for it in the other sections of this report accordigly.

Experimental design

It is definitelly get better rather in the previous review.
Authors address quite decent amount of comments.

The comments would be:
- how many runs were used for obtaining results (no runs, or if several then mention)
- findings consistency: "slightly outperforms" is expected to be accompanied by related percentage perfomance improvement asssessment.

Implementation details
- "To ensure reproducibility, we fix the random seed to 42 across all experiments"
- there is no option for ensuring by just fixing seed and rely on a single run results. Training new model would result in different state anyway and each run. The best you as authors can note for picking a seed value.
- no mention for input size trimming a limitation for tokens. BERT support 512 tokens for full model placement in memory. How do we treat cases of longer input texts?

Result analysis raises several concerns:

Next, Comments on the differnce between JET, GTS, and the proposed method are not clear
- Table 6:
- why several entries that similar among JET, GTS, Ours mentioned 'x' only for JET and GTS?
- Sentence 4. "The wine is wonderful"
How this could be demonstrated or is there any other content in a paper that showcase this difference?
- " focusing solely on the global semantic information of the sentence"
Without factual proof, this claim could be categorized as an "assumption" that serves with no particular contribution to the comparison in the context Example 3.
Revision on the baseline selection clarification is majorly clear.

The code lack of the `requirements.py` for the complete reproduction and lack of README.md/

Validity of the findings

New architecture dubbed as PromptReader that represent a multi-round framework.
Exploits transformer as an architecture, represents a similar solution to the one raised in the paper and comented by other reviewers. Authors address the novelty through the detailed comparion of the proposed system in between Dual-MRC and BMRC. Authors contribute with the comprehensive comparison of this system in various tasks (AEC, AOPE, ASTE) among all the other systems. Although results of the other methods were taken from the original papers, the overall behaviour of the proposed PromptReader looks plausible.

Additional comments

Comments:
- Figure 1:
- The meaning for architecture in the context of the figure arrows is not clear.
- is it process ?
- Better: implicitly explain the meaning for blocks and arrows.
- multi-round or 3-round architecture.
- . This is the detailed internal structure of the PromptReader"
-> "the detailed ...
- Below is the machine reading comprehension framework used in the first two rounds of queries, and above is the framework.
- Where below / above? (caption is under figure).
- better to:
- introduce that figure has top and lower part in first sentence.
- use top, bottom.

- Figure 4: The syntax tree is unnessarry too big.
- Please, pad it on both sides to make it consistent and at the scale of the rest content in the paper and similar to Figure 1.


Reviewer 5
Noticed improvements on ABSA through covered recommendataions and revised reivew of the related works section.
Question 3.
- OK, it again refers to discussions raised by author reviewers on Dual-MRC existance and recall on similarities.
Question 5.
- Clarification on train / test / dev now has declared policy.
Question 6. Authors did not reprocude Dual-MRC rather citing the results.
- OK, clarified.
Question 7. Result consisdecy claims
- OK, improved

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

Dear Dr. Yuyao,

Please consider revisions demonstrating the validity of the findings, update the related literature, provide significance tests, provide details on the annotation quality or reliability of ground truth labels for the benchmark datasets, compare with state-of-the-art work, and better clarify the objectives of the manuscript.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

1. The related works could be further improved by incorporating recent high-quality studies from top conferences or journals in recent years. For example, Enhanced Machine Reading Comprehension Method for Aspect Sentiment Quadruplet Extraction, ACL 2022; USSA: A Unified Table Filling Scheme for Structured Sentiment Analysis, ACL, 2023; Exploiting Duality in Aspect Sentiment Triplet Extraction With Sequential Prompting, in TKDE, 2024; Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment, AAAI 2024.

2. Some figures lack clarity, which needs to be improved.

Experimental design

1. The paper lacks a comparison with state-of-the-art works in recent years, which would provide a clearer assessment of the proposed approach’s advancements. For example, STAGE: span tagging and greedy inference scheme for aspect sentiment triplet extraction; Dual-channel span for aspect sentiment triplet extraction

2. There is insufficient discussion on reproducibility, providing more details on implementation, experimental environment, and results chosen.

3. To better understand the method's robustness, you can analyze the impact of different prompt formats on performance.

Validity of the findings

1. The proposed method appears to be inspired by BMRC, but this is not explicitly stated in the paper. It is recommended to clearly acknowledge this connection.

2. The paper presents a case study comparing JET and GTS; however, the rationale behind selecting these two models is not clearly explained.

Reviewer 2 ·

Basic reporting

This paper presents a novel multi-turn MRC framework that incorporates prompt learning for the ASTE task. The proposed method consists of three rounds of queries, with the first two rounds designed as MRC tasks to identify aspect-opinion pairs and the final round formulated as a prompt learning task to predict sentiment polarity. Overall, this paper discusses an interesting issue and the organization is also well structured. Below are some comments.

1. The Introduction section presents a comprehensive review of the ABSA subtask. However, the sentiment is not always represented using the categorical model (e.g., positive and negative). A new ABSA subtask called dimensional aspect-based sentiment analysis (dimABSA) uses a dimensional model to represent sentiments with sentiment scores on multiple dimensions (e.g., valence and arousal). The authors should discuss the dimABSA (see the Overview paper of the shared task for chinese dimensional aspect-based sentiment analysis).

2. Several recent ABSA studies are not discussed, including the use of syntactic dependency information [1-3], clause-level information [3], contrastive Learning [4-5], and LLM [6].
[1] https://www.sciencedirect.com/science/article/pii/S0925231223008536
[2] https://ieeexplore.ieee.org/abstract/document/9976197
[3] https://www.ijcai.org/proceedings/2018/0617.pdf
[4] https://aclanthology.org/2023.findings-emnlp.727/
[5] https://aclanthology.org/2024.lrec-main.1305/
[6] https://aclanthology.org/2024.findings-acl.284/

The following baseline should be included for comparison or discussed.
https://ieeexplore.ieee.org/abstract/document/10175600

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.

Experimental design

.

Validity of the findings

Consider to provide more examples in Case Study and englarge the table.

Reviewer 3 ·

Basic reporting

1. The author's description of the problem solved by the article is not clear, using the vague statement "end-to-end sequence labeling methods have not fully utilized the given context". After reading the full text, I am still not quite clear about the problem solved and the necessity raised by the author.

2. The diagram design is reasonable. However, in Fig. 4 (example of context dependence), the explanation is too brief, with only the title and a description, without elaborating on how to extract context information based on syntactic dependence. This weakens the auxiliary role of the chart and recommends complementing concrete analysis.

Experimental design

The baseline model for comparison is old and unconvincing.

Validity of the findings

1. The paper does not provide statistical significance tests to determine whether the performance improvement is significant. The numerical difference of the F1-score alone is not convincing enough, and statistical analysis is recommended.

2. The Case study (Fig. 7) provides some error analysis, pointing out the baseline's failure in complex grammatical or polyemotive polar sentences. However, the depth of the analysis is insufficient, showing only successful cases and not systematically exploring the model's error patterns (such as the specific reasons for missing triples). It is recommended to supplement quantitative error analysis, such as error type distribution.

Reviewer 4 ·

Basic reporting

The manuscript is generally well-written and clearly structured. It provides a comprehensive background on aspect-based sentiment analysis (ABSA), including its evolution into the ASTE task. The authors cite a substantial body of related work, and references are relevant and up to date. Figures are appropriate, well-labeled, and helpful in visualizing both task design and model structure (e.g., Figures 2 and 3). Raw data is appropriately described and includes statistics on the datasets used (Table 1).

There are a few problems with this paper:
In line 170, ....direction. (?) proposed a unified generative framework that formalized various ABSA tasks into text. The (?) suggests that this paper was compiled in a hurry.

Experimental design

The research question is clearly defined: how to improve ASTE performance by combining multi-round MRC with prompt learning, while integrating part-of-speech and syntactic dependencies. The framework is novel in its design of multi-round queries and the application of prompt learning in ASTE. Methodology is well-detailed, including model architecture, training procedures, and query design. Ablation studies are thoughtfully constructed and convincingly demonstrate the contribution of each component (e.g., part-of-speech features, prompt learning). Benchmark datasets are appropriate and widely accepted within the ABSA research community.

I would suggest having more details on the annotation quality or reliability of ground truth labels for the benchmark datasets would be beneficial, particularly in interpreting nuanced sentiment and triplet relationships. It would be helpful to discuss the computational efficiency or complexity of PromptReader compared to baseline methods. Consider explicitly addressing whether this multi-round querying process incurs a significant computational overhead.

Validity of the findings

Results show consistent improvement across multiple benchmark datasets and tasks (AESC, AOPE, ASTE). The empirical gains are supported with statistical evidence and further backed by ablation studies (Table 3, Table 4, Figure 5). The conclusions are well-linked to the experimental results and do not overstate the claims. The authors clearly demonstrate that each proposed model component adds measurable value.

The paper would benefit from a discussion on potential failure cases or limitations of the model, e.g., how the model performs with idiomatic expressions or implicit sentiments. While PromptReader shows superior performance, adding statistical significance tests (e.g., paired t-tests or bootstrapped confidence intervals) would strengthen the claim of superiority. It is unclear how well this model would generalize to non-English or low-resource settings. A brief discussion of this limitation or future work in this area would be beneficial.

·

Basic reporting

Overview.
---------

For each section individually.

Introduction:
The context of the studies is aspect-based sentiment analysis (ABSA).
Traditionally, it is about sentiment opinion extraction towards an object with respect to the given aspect.
SA problem considers sentiment opinion extraction while the rest information is given.
- The improvement of the introduction would be defining Sentiment Analysis / ABSA.
Having one on board would give a clear sense of breakdowns that appear further.
- Or, according to https://aclanthology.org/2022.semeval-1.188.pdf, provide an example-based definition for ABSA.
In this work, the authors consider tackling and describing ABSA as a problem in which the only given information is text.
Due to the latter, within the introduction, the authors break down ABSA into ATE, OTE, and ASC subtasks and mention of pipeline ideology for solving.
- Authors refer to MRC, while there is no definition/citations to the other work in the introduction (line 70 and the related paragraph)
- Only later and from the other studies like https://arxiv.org/abs/2101.00816 (DualMRC in this paper), as a reviewer, I found it refers to machine reading comprehension (MRC).
- I would comment to be clear on that, since this concept represents a main contribution, but is vaguely reviewed in the second paragraph of the introduction.

Contributions:
1 and 2. The most similar approaches of existing work were explored through GNN.
https://aclanthology.org/2022.semeval-1.188.pdf
Other contributions:
Through Figure 3, it is possible to see that authors contribute with mechanism of embedding extra metadata besides the text (MRC)
This is something that we did not see in similar works that exploit BERT-based methods (DualBERT).

Review.
The authors decided to arrange 3 subsections for reviewing advances in
- ABSA
- ASTE
- Prompt Learning

Lines 194-196: mention of the we "the first time the application of prompt learning to the ASTE task, leveraging the synergy of two prompt queries with varying levels of constraints to enhance sentiment" is not accurate in the context of the existence of the DualMRC. Is there something I missed from my and the reviewers' side there?
- https://arxiv.org/abs/2101.00816
- In this authors refer to ASTE as `triplet` extraction.


Preliminary
Authors provide a definition for the task formulation (reference to the introduction comments).
The multi-round concept assumes application of different prompts in rounds for individual extraction of the triplet components (ASTE).
As in the previous sections of the paper, it is not clear what the definition of "dynamic" is. What's the meaning in the sense of the architecture and methodology?
By looking at Figure 3, the assumption is that we pass the answer of the static query.

Query design:
The idea of pass 8 queries sequentially into the input of the designed framework.

Experimental dataset:
- Comment related to Table 1 contents:
- Recommended to declare policy on train/dev/test split. Mention percentages for the preliminary understanding of volumes.

Rest of the paper:
Experiments: I see that DualMRC is the closest paper/studies/model, results of which application values as a baseline for the prompt-learning model (this paper).
Thus, we expected to see a clear match on the parameters used in this study with regard to the DualMRC.
In the section implementation details, it is possible to find the BERT parameters choice.
At first glance, it seems to be clear: aligning the scale of the model to one in DualMRC by using bert-base-uncased.
I would encourage authors to give a clear explanation of the source of the results for DualMRC.
For several models listed in the results table, authors emphasize the source of the results (#), while not look like DualMRC has been reproduced.
The evidence on the latter is absent in the shared repository for this paper. The latter is the reason for expecting comments from the authors.
Did you reproduce DualMRC in your context of evaluations, or refer to results mentioned in the original studies?


Ablation: The Authors provide ablation studies that involve the application of the model without features such as POS and syntax tree.
Ablation demonstrates the effectiveness of embedding ST and POS features.

Experimental design

The design of experiments is quite straightforward. Authors provide the details on the utilized datasets. The authors mention which statistics of the datasets. The list of other frameworks for comparison to PromptLearning is provided too. Authors provide ablation studies, which additionally prove the effectiveness of the exploited embedding features.

Validity of the findings

Studies could be reproduced according to the mention of links towards:
model, datasets
Through that, I see the validity of the proposed system.

Additional comments

I would place a bit of strong emphasis on the novelty with respect to dual-MRC studies (cited in this paper). We see added features and multi-round training. Is there anything else that has been missed in the context of adding novelty to the existing dual-MRC? The authors are required to clearly emphasize.

Ethics on the transparency of the organized studies.
- The authors made publicly available a resource utilized for experiments organization (hosted on the GitHub platform).

Minor comments.
---------------
- 498-499 "our model performs well and achieves comparable performance to the larger BERT models on certain datasets":
- How well, quatative statistics? Such claims are expected to have a more concrete support.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.