Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on July 17th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on September 18th, 2025.
  • The first revision was submitted on October 3rd, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on November 5th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

The authors have addressed all the reviewers' concerns. The manuscript is ready for publication; however, it may require the unification of fonts and font sizes in the figures and tables.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

**PeerJ Staff Note:** Although the Academic and Section Editors are happy to accept your article as being scientifically sound, a final check of the manuscript shows that it would benefit from further editing. Therefore, please identify necessary edits and address these while in proof stage.

·

Basic reporting

Dear authors,

I appreciate the authors’ dedication in addressing my concerns in the revised manuscript. At this stage, I have no further comments.

Good luck.

Experimental design

-

Validity of the findings

-

Additional comments

-

Reviewer 3 ·

Basic reporting

My questions have been solved.

Experimental design

no comment

Validity of the findings

no comment

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**PeerJ Staff Note:** PeerJ's policy is that any additional references suggested during peer review should only be included if the authors find them relevant and useful.

**Language Note:** When preparing your next revision, please ensure that your manuscript is reviewed either by a colleague who is proficient in English and familiar with the subject matter, or by a professional editing service. PeerJ offers language editing services; if you are interested, you may contact us at [email protected] for pricing details. Kindly include your manuscript number and title in your inquiry. – PeerJ Staff

·

Basic reporting

The paper introduces a novel framework, CoAgt, designed to enhance the ability of large language models (LLMs) to reason over long tabular data without relying on SQL generation. Inspired by human cognitive processes, CoAgt employs a chain of specialized agents (Collector, Synthesiser, and Answer Refiner) to segment, analyze, and synthesize information from large tables. The framework achieves state-of-the-art performance on benchmark datasets WikiTableQuestions (85.4% accuracy) and TabFact (96.5% accuracy), demonstrating scalability and robustness, particularly for tables exceeding token limits of standard LLMs.

Although most parts of the paper are well-written, there are some concerns that should be noticed by the authors:

1. While the paper draws inspiration from human reasoning (scanning, comparison, memory), it could more explicitly link cognitive theories (e.g., working memory models, dual-process theory) to justify the agent roles (Collector = pattern recognition, Synthesizer = decision-making).

2. The introduction briefly mentions schema misalignment and token limits but could expand on why LLMs struggle with relational reasoning (e.g., lack of 2D structural biases, linearization artifacts).

3. Some baselines (e.g., Chain-of-Table, StructGPT) are described as "strong," but their performance gaps (e.g., Chain-of-Table: 59.9% vs. CoAgt: 85.4%) are not contextualized. Are these differences statistically significant?

4. The choice of 1,000 tokens/agent is pragmatic but lacks empirical justification. A sensitivity analysis (e.g., accuracy vs. token chunk size) would strengthen the design.

5. The prompts (Figures 2–5) are detailed but lack ablation. For example, does step-by-step reasoning ("think step by step") significantly impact performance?

6. The experiments focus on Wikipedia-style tables. How does CoAgt handle noisy or sparse tables (e.g., missing headers, merged cells), common in real-world datasets like Excel or HTML tables?

7. The framework is evaluated on fact-checking (TabFact) and QA (WikiTQ). Could it extend to other tasks (e.g., table summarization, data imputation)? A discussion would broaden the impact.

8. LLMs may inherit biases from training data. Does CoAgt propagate or mitigate biases when processing tables (e.g., demographic data)?

9. The multi-agent system likely increases inference time/compute costs. A comparison of latency/token usage vs. baselines would help practitioners assess trade-offs.

10. Hyperparameter tuning, like Temperature settings (0.2 for Collector, 0.5 for Synthesiser), seems arbitrary. Justify these choices or test robustness to variations.

11. The quality of the figures, especially Figure 1, should be better.

12. Tables 1–4 are clear, but the narrative (Section 5) could better highlight key takeaways (e.g., "CoAgt’s advantage grows with table size").

13. Proofread the entire paper for typos, grammatical errors, punctuation, etc. For example:
• Page 7, Line 155: "introdue" → "introduce"
• Page 16, Ref [5]: "TABFACT: A LARGE-SCALE DATASET..." → Use standard title capitalization.

14. Cross-reference all the references, figures, tables, sections/sub-sections, etc., for better traceability.

15. Define each acronym only at its first appearance and avoid repetition (e.g., "LLM" has been defined six times in the whole paper!).

16. Consider adding an abbreviation section for clarity.

Experimental design

-

Validity of the findings

-

Reviewer 2 ·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Additional comments

This paper uses a team of agents powered by LLMs to conduct table reasoning processes mirroring the human brain’s cognition and behavior. These individual table agents are responsible for specialized functions, while a final agent consolidates these results to produce the final answer. Experiments on WikiTQ and TabFact validate the effectiveness of CoAgt.

There are still some weak issues to be concerned about:

1. The authors designed CoAgt to replicate the human brain’s cognition and behavior. However, there lack of evidence or reference in the manuscript to support or explain how the brain organizes the mental functions to collaboratively accomplish the goal.

2. CoAgt is composed of multiple agents responsible for different functions, which may produce huge computational consumption. However, this study fails to present an efficiency analysis of the proposed CoAgt. In addition, some previous approaches (e.g., Potable) have attempted to use only one LLM agent to handle all separate analytical stages distinctively, which enjoys a high generation efficiency. Compared with them, what is the advantage of CoAgt?

3. Experimental results are not comprehensive. It is suggested to further investigate the effectiveness of different agent components (e.g, agent collector, agent synthesizer, answer refiner) and failure case studies.

4. Some writing flaws need modification. For example, the abstract is not concise enough to summarize the research background and motivation. In addition, the dataset names “WikiTabQA” and “WikiTQ” are not unified in the manuscript.

Reviewer 3 ·

Basic reporting

Regarding "Literature references, sufficient field background/context provided", this submission does not meet the requirement for sufficient literature references and field context.

Reason: The paper fails to cite a similar existing method, PieTa [1]. Both PieTa and the proposed method, CoAgt, operate on the principle of decomposing a large table into subtables to derive answers, and neither requires SQL.

[1] Piece of Table: A Divide-and-Conquer Approach for Selecting Subtables in Table Question Answering

Experimental design

Regarding "Rigorous investigation performed to a high technical & ethical standard", this submission does not meet the requirement for a rigorous technical investigation.

Reason:

1. As mentioned in the first point, the authors should include PieTa [1] as a baseline for a more comprehensive comparison of the proposed method's performance.

2. In the "Scalability and Token Management" subsection, the authors state, "In our experiments, we evaluated various token limits – 1,000, 1,500, 2,000, and 2,500 tokens per agent. The most stable and accurate performance was observed when each agent’s input was limited to approximately 1,000 tokens." It is recommended that the experimental results for these varying token limits be explicitly included in the paper. Furthermore, an analysis of why a 1,000-token limit yields the most stable and accurate performance should be provided.

[1] Piece of Table: A Divide-and-Conquer Approach for Selecting Subtables in Table Question Answering

Validity of the findings

-

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.