Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
ECAsT: a large dataset for conversational search and an evaluation of metric robustness

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on December 28th, 2022 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on March 6th, 2023.
The first revision was submitted on March 13th, 2023 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on March 15th, 2023.

Version 0.2 (accepted)

Naeem Jan · Mar 15, 2023 · Academic Editor

Accept

I recommend it for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Rong Qu, a PeerJ Computer Science Section Editor covering this Section #]

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Mar 13, 2023

Version 0.1 (original submission)

Naeem Jan · Mar 6, 2023 · Academic Editor

Minor Revisions

Please Revise your paper according to the reviewer's comments.
Thank you very much.

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

Mihai Duguleana · Feb 9, 2023

Basic reporting

The English level of the article is satisfying. Occasionally, some phrases need to be rewritten (to eliminate redundancy) e.g. "The meaning and diversity of paraphrases is evaluated with human evaluations conducted through crowd-sourcing and automatic evaluation."
References are sufficient. SOA is good.
Figures are readable, tables are well explained.
The study hypothesis are presented adequately, and the results show the merit of the research.

Experimental design

The aim of the research is congruent with the purpose of the journal. The only issue I find here is the use of mTurk. I would have first tried to gather data via voluntaries. However, my guess is that your approach doesn't influence at all the results of the study. Nevertheless, I encourage you to further emphasize the use of Amazon mTurk over other means. Another issue is with the ethical consent of the humans involved in any part of the research process. I was wondering if you have requested this prior to their actual involvement.

Validity of the findings

The data in the files supplied is relevant. Conclusions are well written.

Cite this review as

Duguleana M (2023) Peer Review #1 of "ECAsT: a large dataset for conversational search and an evaluation of metric robustness (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.1328v0.1/reviews/1

Reviewer 2 · Mar 1, 2023

Basic reporting

This paper provide a new dataset called Expanded-CAst (ECAsT) for The Text REtrieval Conference Conversational assistance track. According to the authors, the ECAst is 665% more extensive in terms of size and language diversity than the CAsT dataset. The ECAsT contains more than 9200 turns. Also, the author mentioned that they use ECAsT to assess the robustness of traditional metrics for conversational evaluation used in CAsT and identify its bias toward language diversity. The authors claims that introducing language diversity via paraphrases in the ECAsT dataset returned up to 24% new passages compared to only 2% using CAsT baseline.

In general, I think this paper is a little bit ambiguous and unclear to me. I did not fully understand how the ECAsT dataset is generated and what are the additional text/semantic resources of the dataset: Does the ECAsT totally comes from expanding the CAsT dataset using rules? The author may want to discuss more about the resources for generating the new dataset.

The references are sufficient in this paper. The background introduction to ECAsT is fairly detailed.

The paper is drafted in a routine structure: Introduction, literature review, methodology, experiments and discussion. Enough details are provided on how to evaluate the proposed new dataset using different metrics.

In general, the authors should provide more details on how the new dataset ECAsT is built: What are the additional text/semantic resources for generating ECAsT (resources beyond CAsT)? But taking the evaluation results using the metrics, I believe that the new dataset is indeed more extensive in terms of size and language diversity comparing to ECAsT.

So, I will recommend minor revision to this paper: Please provide more details on how the new dataset ECAsT is built: What are the text/semantic resources of the dataset?

Experimental design

The authors build the new dataset ECAsT based on the existing benchmark dataset CAsT. Roughly speaking, there are five stages:

Stage 1: Conversational Query Reformulation.
Stage 2: Paraphrasing CAsT Dataset.
Stage 3: Human Evaluation (evaluate generated paraphrases through crowd-sourcing human annotators).
Stage 4: Data Cleaning and Diversification.
Stage 5: Reformulation, Retrieval, and Re-Ranking.

Enough details are provided in each stage. However, after reading through the five stages, I still cannot figure out what are the text/semantic resources for generating ECAsT, except the initial CAsT dataset? The authors should briefly discuss this issue.

Validity of the findings

The authors provide detailed description on how to evaluate the proposed dataset with respect to context removal, paraphrase evaluation, paraphrase diversity, un-judged passage analysis and new QREL effect on performance. Results show that the ECAsT out-performs the CAsT dataset according to the proposed metrics.

Therefore, I think in general the dataset is valuable.

Additional comments

No additional comments.

Cite this review as

Anonymous Reviewer (2023) Peer Review #2 of "ECAsT: a large dataset for conversational search and an evaluation of metric robustness (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.1328v0.1/reviews/2

Download Original Submission (PDF) - submitted Dec 28, 2022

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History ECAsT: a large dataset for conversational search and an evaluation of metric robustness

Summary

Version 0.2 (accepted)

Naeem Jan · Mar 15, 2023 · Academic Editor

Version 0.1 (original submission)

Naeem Jan · Mar 6, 2023 · Academic Editor

Mihai Duguleana · Feb 9, 2023

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Mar 1, 2023

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
ECAsT: a large dataset for conversational search and an evaluation of metric robustness