Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
NepAES: exploring the promise of automated essay scoring for Nepali essays

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on May 5th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on July 23rd, 2025.
The first revision was submitted on August 10th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on September 8th, 2025.

Version 0.2 (accepted)

Varun Gupta · Sep 8, 2025 · Academic Editor

Accept

Thanks a lot for resubmitting the revision and your patience. I am happy to recommend the acceptance. Congratulations to all the authors. I wish you all the best for your future research.

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

Reviewer 1 · Aug 20, 2025

Basic reporting

Thank you for taking my remarks into account, the revised paper has significant improvements and all issues were addressed to a satisfactory extend. The literature review is sufficient and relevant and refers to recent publications. The structure is sound and the figures were updated. Also, as before, raw data and the relevant repository were shared, and the results are relevant to the hypothesis.

Experimental design

As before, the research question is well defined and meaningful. Furthermore, the methodology was updated in a meaningful manner, with insights and further explanations regarding the validity of the translation and the process that was followed.

Validity of the findings

Regarding your findings, the updates are meaningful and on-point, explaining the limitations of the human evaluators. Thank you for taking my suggestions into account.

Additional comments

No additional comments, the revised version of the paper provides appropriate, significant improvements overall.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "NepAES: exploring the promise of automated essay scoring for Nepali essays (v0.2)". PeerJ Computer Science

Reviewer 2 · Sep 4, 2025

Basic reporting

clearly written, good lit review

Experimental design

well executed

Validity of the findings

useful findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "NepAES: exploring the promise of automated essay scoring for Nepali essays (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Aug 10, 2025

Version 0.1 (original submission)

Varun Gupta · Jul 23, 2025 · Academic Editor

Minor Revisions

Dear Authors

Kindly revise the paper as per reviewer comments. Also, improve related work section, include "fully related" articles to strengthen this section (Check Scopus to find some published in 2023 to 2025).

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · Jul 9, 2025

Basic reporting

Your paper is clear and well-written, with a good use of the English language. The literature review is sufficient and relevant, although it would help if there were more references from the last 2 years. The structure is sound; however, a small suggestion for improvement would be to change Figures 3 and 4 to simple bar charts. I'm not sure how the color coding helps. Raw data and the relevant repository were shared, and the results are relevant to the hypothesis.

Experimental design

The research question is well defined and meaningful in the sense that research in AES for essays in languages other than English, and especially for languages like Nepali, with less availability in terms of datasets, is always important.

Regarding your methodology, one comment:
While, as you describe, the creation of a new extensive AES dataset for Nepali is a great idea, the fact that this dataset was created as a result of machine translation is a concern that somewhat impacts the quality of the dataset, as you also found out yourselves from the results and analysis of the evaluation. For example, features of the language that are useful for AES (e.g., grammatical errors or typos) might not be captured well by the machine translation. Another approach to create such a dataset, which might potentially provide better results, could be to retrieve a smaller number of Nepali essays and then try to extend this dataset with synthetic data based on the retrieved essays, e.g., with the help of an LLM.

Validity of the findings

Regarding your findings, you mention that:

"We evaluated different random sets of the translated collection by cooperating with skilled bilingual individuals and professional academics to verify accuracy and reliability. This validation method includes examining the quality of the translated content through the opinions of language specialists and academic professionals. The quality of the content was found to be appropriate and to be articulating the context well."

While later, you deduce that:

"In the NepAES study, the dataset was created by translating an existing dataset (ASAP) into Nepali using translation models like Google Translate and mBART-50. The accuracy of these translations directly influenced the scoring effectiveness, with translation inaccuracies potentially leading to scoring errors. Any inaccuracies in translation could significantly impact the scoring models’ performance."

These statements are a bit contradictory and raise some concerns. If human evaluators found the translations reliable, why would errors from automatic translation still be a major concern? Either the validation wasn’t thorough, or the results show that the human validation didn’t fully catch the impact of translation flaws on model performance. I'd suggest that you rephrase the first part and try to better explain how the validation of the translations was done, and why it might be insufficient.

Additional comments

The article is well-written, and the methodology for running an AES evaluation for specific models/features is sound. However, my main concern has to do with the methodology for creating and validating the translated dataset, as it might lead to less valuable results. I'd suggest that, if feasible, you try to either:
- Provide further details on how the validation happened, what the findings were, and how they impacted the results
- Create a dataset that is more robust to the impact of translation.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "NepAES: exploring the promise of automated essay scoring for Nepali essays (v0.1)". PeerJ Computer Science

Reviewer 2 · Jul 10, 2025

Basic reporting

Good research idea, well-written introduction, and compelling narrative.

However, a formulaic listing of ML methods (pages 10 and 11) and Figure 2 (page 12) is not helpful; it is a high-level architecture of a transformer language model, but doesn't add much to the article.

Experimental design

The design is sound but could be better described. The QWK is an appropriate measure, but others may be considered to improve the interpretation of agreement.

Validity of the findings

This is a useful study that provides a valid approach to create knowledge about the quality of machine scoring with various methods on an interesting database consisting of a low-resource language corpus of scored essays.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "NepAES: exploring the promise of automated essay scoring for Nepali essays (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted May 5, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History NepAES: exploring the promise of automated essay scoring for Nepali essays

Summary

Version 0.2 (accepted)

Varun Gupta · Sep 8, 2025 · Academic Editor

Reviewer 1 · Aug 20, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Sep 4, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Varun Gupta · Jul 23, 2025 · Academic Editor

Reviewer 1 · Jul 9, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jul 10, 2025

Basic reporting

Experimental design

Validity of the findings

Review History
NepAES: exploring the promise of automated essay scoring for Nepali essays