All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Based on the reviewer’s assessment, the authors have successfully addressed all prior concerns. The manuscript now demonstrates corrected formal issues, includes the missing Methodology section, and provides an enhanced description of the evaluation process. With no remaining comments on the validity of the findings, I endorse the decision to accept this article for publication.
[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]
The authors have corrected formal errors in the article.
The authors have added the missing chapter Methodology as well as a more detailed description of the evaluation of word ambiguity in translation from Hindi to Dogri.
No comments.
Thank you for your substantial revisions and your detailed responses to the reviewers' feedback. I am pleased to inform you that your manuscript is nearly ready to accepted for publication.
However, Reviewer 1 noted lingering issues related to citation accuracy, structure, and evaluation clarity (particularly regarding ambiguity and BLEU scoring).
For example: Reference [22] is a German-language physics paper from 1922 entitled "Spark lines in the X-ray spectrum", which does not seem to be related to the present article. https://onlinelibrary.wiley.com/doi/10.1002/andp.19223712302
The document does not meet the formal requirements. The author refers in the text to figures, tables or sources that are under a different number. Figures and tables are arranged according to an unclear key (figure, table, table, figure,...). The sources in the reference section are incomplete or inaccurate.
The authors refer to sources [25], [26], [27], [28], which should contain information on the evaluation of MT in terms of fluency, adequacy and ambiguity. The sources listed do not contain any information on ambiguity. Overall, the assessment of ambiguity in the work is vague, the authors could have given an example where ambiguity occurs in the translation (ideally in English with an explanation). For a person who does not speak Hindi, this information is untraceable.
The authors refer to source [22], which should contain research related to the BLEU metric. The given link contains another article. Given that the BLEU metric is commonly used in the field of machine translation evaluation, it is assumed that authors should be familiar with this source and be able to cite it properly.
The authors refer to the source [23], which should contain research related to MT fluency and adequacy. However, the mentioned research is devoted to the categorization of MT errors and does not address fluency and adequacy at all.
The authors did not follow the logical structure of the research, they first present the results and only then the methodology. In fact, in the methodology they justify the choice of research tools based on their own findings in the given research.
In the BLEU score analysis section you state that "RBMTS consistently achieved the highest BLEU score across all sentence lengths..." however, in the case of medium-length sentences, RNN achieved the best score with 54.73, which is higher than the score for RBMTS.
For fluency and adequacy you used the well-established Likert scale. On what basis did you determine the ambiguity score?
The authors have fully addressed all previous comments regarding clarity, formatting, and completeness. The manuscript now fully meets the expectations for basic reporting.
All reviewer concerns regarding the experimental setup, dataset preparation, and model implementation have been addressed. The authors clarified hyperparameters, training settings, and preprocessing methods. They also improved the evaluation design, including a more detailed explanation of the human judgment protocol and mention of participating raters. The experimental design is now rigorous, transparent, and sufficiently documented.
The findings are now more valid and better supported by the clarified methodology and open data.
The authors have carefully and thoroughly responded to all reviewer comments. They revised the manuscript substantially to improve its structure, clarity, reproducibility, and presentation. All technical and editorial concerns, including figure clarity, dataset access, formatting, and evaluation protocols, have been satisfactorily addressed. I confirm that no outstanding issues remain, and the revised version is ready for publication from a peer-review standpoint.
Dear Authors,
Thank you for submitting your manuscript. After carefully considering the reviewers’ feedback, I have concluded that your paper requires major revisions before it can be considered for publication. While several significant issues were raised, I would like to emphasize one in particular that demands your immediate attention:
Dataset Transparency: The reviewers underscore the necessity of providing clear information about—and access to—your dataset. Please offer a thorough description of your corpus development, including data sources, preprocessing steps, annotation methodology, and any other relevant details. If feasible, include a link or instructions that would enable other researchers to obtain or replicate this dataset.
We encourage you to address these concerns comprehensively and to submit your revised manuscript once the necessary amendments have been made.
Thank you again for the opportunity to review your work. We look forward to receiving your revised submission.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
- The document does not meet the formal requirements of the publisher, and separate Methodology and Discussion sections are missing. The reader must search the text for information about the methodology.
- The document contains errors in terminology - 302 - BUEL; source [19], which should refer to the BLEU metric, is incorrect; sources [13] and [14] refer to the same source; incorrect source notation [22].
- Table II. contains unknown characters.
- The authors refer (line 300) to Table VII, instead it is actually Table VIII
- In the section devoted to the differences of the studied languages, feature VII is not mentioned, as in the remaining listed items.
206-213, the text is highlighted in red for no apparent reason.
- Table VI is unclear. What do the numbers separated by commas represent?
- Inappropriate text marking - the crossed-out text indicating incorrect translations (190-200) is more difficult to read for a non-Indian reader.
- The authors confirmed the higher success rate of RBMT translation compared to SMT and NMT translation for the Hindi-Dogri language pair. However, the conclusion of this information is a bit confusing. First, it is stated that RBMT is better than SMT and NMT, and immediately after that is the sentence, SMT and NMT are worse than RBMT (357-360). I recommend that this statement be further elaborated and clarified.
The research builds on previous research by Dubey, Rakhra, but that one is not open access; therefore, it is not easy to get to the dataset.
It is not stated how many human raters participated in the research.
On what basis did you determine the final assessment of the approach's accuracy and fluency?
Clarity and Language
• The manuscript generally follows an academic structure; however, the English language requires significant improvement. Several sentences are grammatically incorrect or awkwardly phrased, which impacts readability and comprehension.
• Examples of unclear or awkward phrasing include:
o "Three main approaches of corpus-based MT techniques are very popular:..." → Consider revising to: "Three corpus-based MT approaches are commonly used:..."
o "Translation is a challenging task for both humans and machines..." → Could be made more concise and formal.
Structure and Formatting
• The manuscript structure adheres to standard scientific formatting with clearly labeled sections (Introduction, Methodology, Results, etc.).
• Subsections for each translation approach (RBMT, SMT, NMT) are logically organized.
• However, figure captions and table legends need to be improved for clarity and completeness.
Figures and Tables
• Figure 4 (Encoder-Decoder Architecture of NMT) lacks clarity. There is no explanation of the colors and shapes used, making it difficult for the reader to interpret the model architecture.
• TABLE I contains missing characters, which impacts readability and must be corrected.
• TABLE II, TABLE IV, and TABLE V are presented entirely in Hindi/Dogri scripts without English translations. For accessibility and to meet international publication standards, English translations should be included either in-line or as footnotes.
Literature and References
• The manuscript cites a wide range of relevant literature, covering foundational works in rule-based, statistical, and neural machine translation.
• However, some in-text references are inconsistently formatted and should be revised for proper citation style (e.g., [2][3], [7][8]).
Raw Data and Materials
• While a brief description of the dataset sources and structure is provided, the manuscript would benefit from a clearer explanation of how the corpus was constructed, its availability, and any preprocessing steps.
• A link to the dataset or a statement about how it can be accessed would increase transparency and replicability.
Originality and Scope
• The study addresses a relevant and underexplored topic in the field of machine translation: the development and evaluation of MT systems for the Hindi-Dogri language pair, which is considered a low-resourced language scenario.
• The paper falls within the scope of the journal and presents original primary research that compares three major MT paradigms: Rule-Based (RBMT), Statistical (SMT), and Neural (NMT).
Research Question and Objectives
• The research objectives are clearly stated, and the study aims to answer whether NMT provides better translation results than SMT and RBMT for Hindi-Dogri text.
• The motivation is well-justified, especially given the lack of prior MT systems for Dogri.
Methodological Rigor
• All three approaches are implemented and tested using a shared corpus, which allows for fair comparison.
• The RBMT system is developed using bilingual dictionaries and handcrafted rules. The SMT system is implemented using the Moses toolkit, and four NMT models are developed using TensorFlow and Keras, including Embedding LSTM, Bidirectional LSTM, Bidirectional Embedding LSTM, and Encoder-Decoder GRU.
• The use of multiple deep learning models adds depth to the experimental analysis.
Corpus and Data Collection
• The dataset includes approximately 0.1 million Hindi-Dogri parallel sentences, gathered from newspapers, journals, and a conversation book digitized via OCR.
• Although suitable for SMT, the corpus size is relatively small for training deep neural models, which may explain the limited performance of NMT models.
• The data undergoes preprocessing steps, such as tokenization, cleaning, and normalization, which are briefly described. However, more detail on the preprocessing pipeline and vocabulary coverage would strengthen the reproducibility.
Implementation Details
• The NMT models were trained using a Tesla P100-PCIE GPU, with training, validation, and test sets split at 80/10/10. A batch size of 32 and 50 epochs were used for training.
• These details are helpful, but the paper would benefit from including additional hyperparameters, such as learning rates, dropout rates, and optimizer settings.
Evaluation Methods
• The models are evaluated using both automatic (BLEU scores) and human evaluation (adequacy and fluency) on a five-point scale. This dual approach provides a well-rounded perspective on model performance.
• However, the inter-rater reliability or evaluation protocol for human judgments is not described and should be added for clarity.
The findings are well-aligned with the presented results. The higher performance of the RBMT system is justified given the linguistic similarity between Hindi and Dogri and the limited dataset size. Evaluation using BLEU scores and human judgments (adequacy and fluency) supports the conclusions.
However, the manuscript lacks statistical significance testing to confirm differences between models. Details on whether results were averaged across runs are missing. Reproducibility would improve with more implementation specifics and access to the dataset or code. The authors appropriately acknowledge limitations and propose hybrid systems for future work.
Figure 4 (Encoder-Decoder Architecture of NMT) is not clearly explained. Please provide a legend or description for the colors and shapes used to help readers interpret the figure accurately.
TABLE I appears to have missing or unreadable characters. Kindly review and correct the formatting to ensure all phonetic symbols and text are displayed properly.
English translations are necessary for TABLE II, TABLE IV, and TABLE V to enhance the accessibility of the content for international readers.
Consider performing a thorough proofreading of the manuscript to correct grammatical issues and improve sentence clarity. Language polishing would significantly enhance the readability of the paper.
The study is commendable in addressing a low-resource language like Dogri. Including links to the corpus or code repository, if possible, would further benefit the research community.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.