Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on March 20th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on July 14th, 2025.
The first revision was submitted on August 7th, 2025 and was reviewed by 3 reviewers and the Academic Editor.
A further revision was submitted on September 17th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on October 9th, 2025.

Version 0.3 (accepted)

Giovanni Angiulli · Oct 9, 2025 · Academic Editor

Dear Author,

Your paper has been revised. It has been accepted for publication in PeerJ Computer Science. Thank you for your fine contribution.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 · Sep 19, 2025

Basic reporting

None

Experimental design

None

Validity of the findings

None

Additional comments

The authors have thoroughly addressed all of my comments.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.3)". PeerJ Computer Science

Reviewer 3 · Sep 28, 2025

Basic reporting

All my comments are addressed.

Experimental design

It's good to publish the paper.

Validity of the findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.3)". PeerJ Computer Science

Download Version 0.3 (PDF) Download author's response letter (v0.3) - submitted Sep 17, 2025

Version 0.2

PeerJ Staff · Sep 4, 2025 · Academic Editor

Minor Revisions

Reviewer 1 · Aug 15, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

I commend the authors for thoroughly addressing my concerns. I have no further comments, except for one minor one regarding citations.

The manuscript cites Kim, Muhn, and Nikolaev (2024) multiple times. However, the authors have withdrawn their preprint from public circulation. According to their statement on https://arxiv.org/abs/2407.17866

"A co-author identified inconsistencies in the data and analyses while attempting to replicate past analyses from the working paper. Accordingly, we have temporarily withdrawn the working paper from circulation while we review the research findings."

Given this withdrawal, I recommend finding an alternative citation to support the relevant points in the manuscript.

Reference: Kim, A. G., Muhn, M., & Nikolaev, V. V. (2024). Financial Statement Analysis with Large Language Models. arXiv preprint arXiv:2407.17866. [Withdrawn]

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.2)". PeerJ Computer Science

Reviewer 2 · Aug 10, 2025

Basic reporting

The authors have addressed all my concerns.

Experimental design

Validity of the findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.2)". PeerJ Computer Science

Reviewer 3 · Aug 18, 2025

Basic reporting

The revision is better.

Experimental design

Show some best and worst results and analyze these cases in detail.

They should do more experiments and compare the results with other research on the same topic.

Limitations and future direction of research should be expanded.

Validity of the findings

They should use more figures to visualize the method and the result.

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Aug 7, 2025

Version 0.1 (original submission)

PeerJ Staff · Jul 14, 2025 · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff

Reviewer 1 · May 21, 2025

Basic reporting

• Writing quality – The prose is clear and professional.

• Structure & referencing – The introduction offers solid background and a concise summary. The manuscript follows PeerJ’s section order, with current, relevant citations and Zenodo links for code and data.

• Figures & tables – Figure 1 is fully legible, and all tables are well laid out and self-contained.

Experimental design

• Research question – The study asks an original, well-motivated question: Can large language models extract return-predictive sentiment from Japanese 10-K MD&A sections?

• Dataset & scope – The sample covers 11,135 firm-years (2014–2023) for March fiscal-year-end firms; the scope is appropriate and transparently justified. Clarify why the series starts in 2014 and explain the sharp drop in observations in 2023 versus 2022.

• Reproducibility – Code, data, and model checkpoints are provided.

• Evaluation choices – Long–short portfolios and factor alphas are suitable. A brief note on why value weighting, not equal weighting, was chosen would complete the picture.

Validity of the findings

• Economic magnitude – Annualized alphas of -6% to -10% are economically large; comparing them with well-known anomalies (e.g., size or momentum) would contextualize their importance.

• Robustness – Add (i) equal-weight portfolios and (ii) a simple transaction-cost adjustment to gauge implementability.

Additional comments

1. Sign interpretation – Explain briefly why a higher tone score predicts lower future returns (scoring scale vs. behavioral story).

2. Standard-error specification – State lag length and any clustering.

3. Hyperparameters – Report full API settings for each LLM.

4. Tone coding – Specify how each word is scored (e.g., +1 = positive, 0 = neutral, -1 = negative).

5. Replace “GPT-4-mini” with “GPT-4o-mini.”

6. In Table 2, explain why the means of Tone Ratio and Tone Score are negative, whereas those from the language models are all positive.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.1)". PeerJ Computer Science

Reviewer 2 · Jun 18, 2025

Basic reporting

This study set out to examine whether advanced large language models (LLMs) can extract return-predictive information from Japanese 10-K reports. Below are some comments.

In sentiment analysis, the sentiment is not always represented using the categorical model (e.g., positive and negative). A new dimensional model that can represent sentiments with sentiment scores on multiple dimensions (e.g., valence and arousal) has been proposed. The authors should discuss the dimensional sentiment analysis (e.g., multi-dimensional relation model, refining word embeddings, Chinese EmoBank, etc).

Experimental design

Validity of the findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.1)". PeerJ Computer Science

Reviewer 3 · Jul 8, 2025

Basic reporting

This paper used 3 LLMs: ChatGPT, Claude, and Gemini to extract sentiment from Japanese 10-K reports for predicting future stock returns. The authors did experiments and compared them with traditional dictionary-based methods and a DeBERTaV2-based model to evaluate the information extraction.

Experimental design

The papers didn’t explain in detail why they chose 3 LLMs: ChatGPT, Claude, and Gemini to extract sentiment.
They didn’t present their prompt in detail.
The two compared methods, traditional dictionary-based methods and a DeBERTaV2-based model, are not explained well.
Evaluating the extracted sentiment is not presented well.
They didn’t analyze the result in detail.
They didn’t discuss efficiently why GPT-4 and Claude got the better results.
They didn’t compare with any research.

Validity of the findings

The technique of the paper is weak.
The paper lacks novelty, and its experiments are not strong.
The authors should add more recent studies to the references.

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Mar 20, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models

Summary

Version 0.3 (accepted)

Giovanni Angiulli · Oct 9, 2025 · Academic Editor

Reviewer 1 · Sep 19, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Sep 28, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.2

PeerJ Staff · Sep 4, 2025 · Academic Editor

Reviewer 1 · Aug 15, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Aug 10, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Aug 18, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

PeerJ Staff · Jul 14, 2025 · Academic Editor

Reviewer 1 · May 21, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jun 18, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 3 · Jul 8, 2025

Basic reporting

Experimental design

Validity of the findings

Review History
From words to returns: sentiment analysis of Japanese 10-K reports using advanced large language models