All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thanks a lot for resubmitting the revision and your patience. I am happy to recommend the acceptance. Congratulations to all the authors. I wish you all the best for your future research.
[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]
Thank you for taking my remarks into account, the revised paper has significant improvements and all issues were addressed to a satisfactory extend. The literature review is sufficient and relevant and refers to recent publications. The structure is sound and the figures were updated. Also, as before, raw data and the relevant repository were shared, and the results are relevant to the hypothesis.
As before, the research question is well defined and meaningful. Furthermore, the methodology was updated in a meaningful manner, with insights and further explanations regarding the validity of the translation and the process that was followed.
Regarding your findings, the updates are meaningful and on-point, explaining the limitations of the human evaluators. Thank you for taking my suggestions into account.
No additional comments, the revised version of the paper provides appropriate, significant improvements overall.
clearly written, good lit review
well executed
useful findings
Dear Authors
Kindly revise the paper as per reviewer comments. Also, improve related work section, include "fully related" articles to strengthen this section (Check Scopus to find some published in 2023 to 2025).
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
Your paper is clear and well-written, with a good use of the English language. The literature review is sufficient and relevant, although it would help if there were more references from the last 2 years. The structure is sound; however, a small suggestion for improvement would be to change Figures 3 and 4 to simple bar charts. I'm not sure how the color coding helps. Raw data and the relevant repository were shared, and the results are relevant to the hypothesis.
The research question is well defined and meaningful in the sense that research in AES for essays in languages other than English, and especially for languages like Nepali, with less availability in terms of datasets, is always important.
Regarding your methodology, one comment:
While, as you describe, the creation of a new extensive AES dataset for Nepali is a great idea, the fact that this dataset was created as a result of machine translation is a concern that somewhat impacts the quality of the dataset, as you also found out yourselves from the results and analysis of the evaluation. For example, features of the language that are useful for AES (e.g., grammatical errors or typos) might not be captured well by the machine translation. Another approach to create such a dataset, which might potentially provide better results, could be to retrieve a smaller number of Nepali essays and then try to extend this dataset with synthetic data based on the retrieved essays, e.g., with the help of an LLM.
Regarding your findings, you mention that:
"We evaluated different random sets of the translated collection by cooperating with skilled bilingual individuals and professional academics to verify accuracy and reliability. This validation method includes examining the quality of the translated content through the opinions of language specialists and academic professionals. The quality of the content was found to be appropriate and to be articulating the context well."
While later, you deduce that:
"In the NepAES study, the dataset was created by translating an existing dataset (ASAP) into Nepali using translation models like Google Translate and mBART-50. The accuracy of these translations directly influenced the scoring effectiveness, with translation inaccuracies potentially leading to scoring errors. Any inaccuracies in translation could significantly impact the scoring models’ performance."
These statements are a bit contradictory and raise some concerns. If human evaluators found the translations reliable, why would errors from automatic translation still be a major concern? Either the validation wasn’t thorough, or the results show that the human validation didn’t fully catch the impact of translation flaws on model performance. I'd suggest that you rephrase the first part and try to better explain how the validation of the translations was done, and why it might be insufficient.
The article is well-written, and the methodology for running an AES evaluation for specific models/features is sound. However, my main concern has to do with the methodology for creating and validating the translated dataset, as it might lead to less valuable results. I'd suggest that, if feasible, you try to either:
- Provide further details on how the validation happened, what the findings were, and how they impacted the results
- Create a dataset that is more robust to the impact of translation.
Good research idea, well-written introduction, and compelling narrative.
However, a formulaic listing of ML methods (pages 10 and 11) and Figure 2 (page 12) is not helpful; it is a high-level architecture of a transformer language model, but doesn't add much to the article.
The design is sound but could be better described. The QWK is an appropriate measure, but others may be considered to improve the interpretation of agreement.
This is a useful study that provides a valid approach to create knowledge about the quality of machine scoring with various methods on an interesting database consisting of a low-resource language corpus of scored essays.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.