Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on December 16th, 2024 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on April 23rd, 2025.
  • The first revision was submitted on June 30th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on July 25th, 2025.

Version 0.2 (accepted)

· Jul 25, 2025 · Academic Editor

Accept

The reviewers are satisfied with the recent changes so I can recommend this article for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

·

Basic reporting

This version looks better. Thanks for addressing the review comments.

Experimental design

-

Validity of the findings

-

Reviewer 3 ·

Basic reporting

The authors have addressed my comments, and I have no further comments.

Experimental design

-

Validity of the findings

-

Cite this review as

Version 0.1 (original submission)

· Apr 23, 2025 · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

·

Basic reporting

The paper is written in professional English with only minor grammatical issues.
The structure follows academic standards, and references are relevant but could be expanded.
Figures and tables should be more clearly labeled and discussed.

Experimental design

The methodology is sound but needs clearer justifications for algorithm choices.
Data preprocessing steps are not explored in details and require more details.

Validity of the findings

The conclusions align with the results but require more analysis of error cases.
The impact of different hyperparameter settings on performance is not fully explored.

Additional comments

The paper presents a novel and impactful approach but requires refinements in methodological rigor, data analysis, and presentation for clarity. Thank you for such a nice article. Refer review comments.

Abstract:
1. The abstract provides a high-level summary but lacks clarity on the key contributions and comparative performance metrics.
2. You mention “showcasing its superiority over several baseline solutions” you need to mention baseline models.
3. Brief summary of the dataset sizes and models evaluated would be helpful.
4. Can you specify key quantitative performance improvements over baselines

Introduction:
1. The transition from prior work to the proposed solution is abrupt. You may provide clearer explanation of why existing PLM-based approaches are insufficient.
2. You may need to clearly differentiate between methodological contributions and empirical findings.
Related Work:
1. You may need to add critical analysis of the limitations of prior models.
2. Any justification why PLMs need additional clustering for performance enhancement.

Dataset and Preprocessing:
1. For dataset provide more information regarding class distribution and data preprocessing
2. Did you apply any data augmentation techniques to mitigate bias and improve generalization
3. For YouTube comments data how dataset diversity was ensured?


Methodology:
1. Any justification why a particular clustering algorithm was chosen
2. Any justification for BERT embeddings as opposed to alternative embedding methods. What value BERT embeddings added?

Experimental Setup:
1. To assess whether improvements over baselines are statistically meaningful, add significance tests ex. p-values.
2. How did you ensure all models are trained under comparable settings and what are explicit differences.
3. How hyperparameters ex. learning rate, batch size, dropout rate were selected?
4. You mention NVIDIA RTX 3090 Ti GPU, but there is no discussion on model training time, computational cost, or scalability
5. You claim - combination of clustering and PLMs improves performance - but they do not provide information / analysis to isolate the effect of clustering. You could conduct experiment with (1) PLM only (without clustering) (2) Clustering only (without PLM) (3) Both PLM and clustering. Also compare their performance to show the individual contribution of each

Results and Discussion:
1. There are no justification why certain dialects are easier or harder to classify?
2. Did you analyze (1) which dialects were most misclassified and (2) what were the common error patterns and (3) any reasons for these errors?
3. Did you analyze model performance with different amounts of labeled data
4. You claim - claims superiority over existing methods – consider adding percentage improvement numbers and statistical tests to prove significance to add more value to the research.

Limitations:
1. Are there any certain dialects that are overrepresented in the dataset? Did you conduct data distribution analysis showing the number of examples per dialect. Ex. If 40% of the dataset consists of one dialect, the model may be biased toward it.
2. There is no discussion on how the model would perform on unseen dialects
3. If you rely on PLMs and clustering, would it scale well to very large datasets


Conclusion:
1. Conclusion does not effectively summarize the key contributions.
2. I can see the Limitations section covers challenges; the conclusion does not acknowledge specific weaknesses of the study.
3. The final paragraph of the conclusion should transition smoothly into future research directions.

Reviewer 3 ·

Basic reporting

In this mansucript, Abdelmajeed et al. proposed a framework for multi-dialect Arabic identification. However, the proposed framework can identify one sentence as a dialect; however, the correct dialect is far from the correct dialect. Based on Figure 7, the model predicts that many examples of the Saudi dialect are classified as Moroccan dialect. In addition, Figure 3 shows that the model incorrectly classifies all examples of some dialects, such as Emirates and Tunisia. Therefore, authors should discuss more about the model predictions and whether they make sense or not. 


1- Language and format
The paragraphs are so long. For example, the introduction section contains one paragraph from lines 50 to 105. In addition, there is no space between paragraphs. In addition, some information is missing in the references. For example, line 610 contains missing information. Table captions contain table 1. The authors have used abbreviations not mentioned in the text, such as DOH and BEI. Finally, for the confusion matrices, some texts were not aligned correctly with the columns, especially Figure 7.

Experimental design

1- Model Accessibility

The authors created a model for each dataset. There is no final model that can be used to address the target problem. As in figure 1, what is the final model used in figure 1. Additionally, I recommend that authors choose one model and design a python package that allows other users to import this package and use this final model directly. 

2- Description of the datasets and unbalanced datasets

The text lacks dataset descriptions, and there is no information about the number of dialects in each dataset. After reading Figure 7, the reader can understand that the number of examples of the Egyptian dialect is close to 1,000, whereas the number of examples of the Sudan dialect is about 50 examples. The number of examples of dialect per dataset indicates the problem of an unbalanced dataset. However, the authors did not mention any unbalanced data in their text. This problem plays a major role in the prediction power of the model. For example, the power of the model is higher in figure 5 than in Figure 7 because the number of examples per class is similar. 

In addition, there is no information in the text about who annotated the examples in the dataset. Is the annotation process performed manually or semi-automatic. 




3- Baseline model

By looking for table 1, the authors provide an example of how one sentence can be written in different dialects. Words can also be written in different dialects, such as Kayfas, kyf, and Azay. Each dialect has special words that can be used as a feature list. Therefore, the reader may understand the Naïve Bayes algorithm can be used as the baseline model. 

 

4- Model description

The model lacks detailed structural descriptions.

 

5- Using more datasets in testing.

  The models were trained and tested on the same dataset. The authors can use other datasets in this study to validate the prediction power of the proposed models.

Validity of the findings

1- Discussion of model predictions

The text lacks a discussion of the model predictions. Many dialect examples are shared with many other dialect examples. The authors only report the model accuracy in the tables and confusion matrixes. The reader can understand from the confusion matrix in Figure 7 that the model predicts the Saudi dialect correctly for 298 examples, but the model also predicts 98 Saudi dialect examples as the Morocco dialect. Approximately 30% of Saudi dialect examples are classified as Morocco. This indicates that the model predicted that the Morocco dialect is the most dominant dialect to the Saudi dialect. In addition, many examples of the Moroccan dialect are classified as the Saudi dialect. Please note that this number was read based on the caption of the x-axis. There are also some spaces, and I cannot align the text with the columns. 

 

Figure 3 demonstrates that the model incorrectly classifies all examples for some dialects, such as Emirates and Tunisia. Why? What is the reason for that. Does the dataset require additional annotations?

 

2- Two related dialects.

 When the model predicts an incorrect dialect, it is expected to be a close dialect. As in a comment, the model predicts many Saudi examples as Morrocco examples. Therefore, the authors should define the two related dialects. The authors should conduct co-occurrence analysis for each dialect. Number of words shared between two dialects. How many special words are in each dialect? What are the two closed dialects? In addition, this analysis can be extended to a hierarchical clustering tree. When the model identified two related dialects as in figure 7, the authors validated the model prediction with the closed dialect using co-occurrence analysis or other analyses. 




Using more datasets in testing.

  The models were trained and tested on the same dataset. The authors can use other datasets in this study to validate the prediction power of the proposed models.

 

3- Role of large language models and GenAI in dialect prediction

Because we used LLMs in different ways, the reader can expect that he/she can use LLMs. I tried to use GROK (See below for appendix 1) for the sentence “How do you pronounce the name of this place” in Tunisa, in table 1. GROK classified this sentence as Morrocco, not Tunisia, because the word Kayfas was used more frequently in Morrocco. GROK explained that Tunisia used a different common word instead of Kayfas. Even though GROK gave a wrong answer, it was close to the Tunisian dialect.

 

In addition, the datasets contain unbalanced examples. Can GenAI help generate more examples of dialects with small numbers. 

 

Therefore, the authors should discuss the role of LLMs and GenAI in their study.

 

---------------------------------------------

Appendix 1

Using GROK for the sentence “How do you pronounce the name of this place” in Tunisa

 

الإجابة المباشرة

يبدو أن لهجة الجملة "كيفاش تنطق اسم لابسة هذه" هي الدارجة المغربية، بناءً على استخدام كلمة "كيفاش" التي تُعتبر مميزة لهذه اللهجة.
البحث يشير إلى أن "كيفاش" تُستخدم بشكل رئيسي في المغرب، بينما لا تُستخدم في لهجات أخرى مثل الجزائرية أو التونسية أو الليبية.

تحليل الجملة

الجملة تحتوي على "كيفاش"، وهي تعني "كيف"، وهي كلمة شائعة في الدارجة المغربية. كلمات مثل "لابسة" و"هذه" تتماشى أيضًا مع السياق اللغوي المغربي، حيث "لابسة" تعني "ترتدي" وتُستخدم في اللهجات المغاربية.

السياق الثقافي

الجملة تبدو غير رسمية وتُستخدم في المحادثات اليومية، مما يعزز فكرة أنها من الدارجة المغربية.

التقرير التفصيلي

مقدمة

هدف هذا التقرير هو تحديد لهجة الجملة "كيفاش تنطق اسم لابسة هذه" بناءً على تحليل لغوي ومقارنة مع اللهجات العربية الأخرى. تم الاعتماد على مصادر إلكترونية موثوقة لضمان دقة التحليل، مع التركيز على كلمة "كيفاش" كمفتاح لتحديد اللهجة.

التحليل اللغوي

كلمة "كيفاش": تُعتبر "كيفاش" كلمة مميزة للدارجة المغربية (Darija). وفقًا لموقع Mo3jam - كيفاش، فإن "كيفاش" تُجمع من "كي" و"فاش"، وتعني "كيف"، وهي شائعة في المغرب. يشير الموقع إلى أنها تُستخدم في سياقات يومية مثل "كيفاش اعمل كفتاجي؟"، مما يعكس استخدامها في المحادثات غير الرسمية.
مقارنة مع اللهجات الأخرى:
في الدارجة الجزائرية، يُقال "كيف" أو "وكيف" للتعبير عن "كيف"، وفقًا لموقع eArabic - How to Say How Are You in Arabic، حيث يُذكر أن "كيداير" تُستخدم في الجزائر و المغرب للتعبير عن "كيف حالك".
في الدارجة التونسية، يُستخدم "كيف" أيضًا، كما هو مبين في Ithaca Bound Languages - How to greet someone in Tunisian Arabic، حيث يُذكر أن "كيفنـّك؟" تُستخدم للتعبير عن "كيف حالك؟"، ويُشار إلى أن "كيفاش" تُعتبر أكثر رسمية أو مستمدة من العربية الفصحى.
في الدارجة الليبية، لم يُجد دليل على استخدام "كيفاش" كجزء من اللهجة اليومية، وفقًا لموقع Omniglot - Libyan Arabic language، حيث يُركز على اللهجات المحلية دون ذكر "كيفاش".

السياق الثقافي والاستخدام

الجملة "كيفاش تنطق اسم لابسة هذه" تبدو غير رسمية وتُستخدم في سياق محادثة يومية، مما يتماشى مع طبيعة الدارجة المغربية. كلمة "لابسة" تعني "ترتدي" وهي شائعة في اللهجات المغاربية، بينما "هذه" تُستخدم للإشارة إلى شيء معين، وهي تتماشى مع السياق المغربي.
يُظهر البحث أن "كيفاش" تُرتبط بموقع إخباري مغربي يُدعى كيفاش، مما يعزز فكرة أنها تُستخدم بشكل رئيسي في المغرب.

جدول مقارنة اللهجات

لتوضيح الفرق، إليكم جدولًا يقارن كيفية قول "كيف" في اللهجات المغاربية:

لهجة



كيف يُقال "كيف"



ملاحظات




المغربية (Darija)



كيفاش



شائعة جدًا في المحادثات اليومية.




الجزائرية



كيف، وكيف



"كيداير" تُستخدم لـ "كيف حالك".




التونسية



كيف



"كيفنـّك؟" شائعة، "كيفاش" أقل شيوعًا.




الليبية



كيف



لا دليل على استخدام "كيفاش".

الاستنتاج

بناءً على التحليل، يبدو أن لهجة الجملة "كيفاش تنطق اسم لابسة هذه" هي الدارجة المغربية. يُدعم هذا الاستنتاج بوجود "كيفاش" كمصطلح مميز للمغرب، وغياب استخدامه في اللهجات الأخرى كجزء أساسي من اللغة اليومية.

المراجع الرئيسية

Mo3jam - كيفاش
eArabic - How to Say How Are You in Arabic
Ithaca Bound Languages - How to greet someone in Tunisian Arabic

Omniglot - Libyan Arabic language

Cite this review as

Reviewer 4 ·

Basic reporting

This paper introduces an approach that incorporate unlabelled corpora for Arabic dialect identification. In general, the language of the paper is clear, but ambiguity appears in many parts of the paper as will be reported later.

Introduction showed the context, while literature review or related work needs to be well written. In related work, there is no data regarding reported results nor dataset of any previous work. Even, performance measures f1 score or accuracy never have been mentioned in related work. Authors should provide a glimpse about the reported results and dataset used.

In addition, literature has to be updated where there is no single reference in 2024 and only one reference in 2023 (which is one of the co-author's paper). There are various papers have been published in 2024 and need to be discussed in related work.
A tiny search in google scholar resulted dozens of related paper as:-
1- Alsuwaylimi, Amjad A. "Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM." Heliyon 10.17 (2024).
2- Abdul-Mageed, M., Keleg, A., Elmadany, A., Zhang, C., Hamed, I., Magdy, W., ... & Habash, N. (2024, August). NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task. In Proceedings of The Second Arabic Natural Language Processing Conference (pp. 709-728).
3- Al-Azani, S., Alturayeif, N., Abouelresh, H., & Alhunief, A. (2024, June). A Comprehensive Framework and Empirical Analysis for Evaluating Large Language Models in Arabic Dialect Identification. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
4- Alahmari, S., Atwell, E., & Alsalka, M. A. (2024, June). Saudi arabic multi-dialects identification in social media texts. In Science and Information Conference (pp. 209-217). Cham: Springer Nature Switzerland.
5- Yafooz, W. M. (2024). Enhancing Arabic Dialect Detection on Social Media: A Hybrid Model with an Attention Mechanism. Information, 15(6), 316.

Structure of the paper conforms to PeerJ standards, but details need to be added. All references should be modified, not following the correct format. Some references without the year of publication?!!



Introduction introduces the subject and make it clear what the motivation is.

Definitions of all terms and abbreviations need to be refined. (Highlighted inside the attached pdf)

Experimental design

The article content is within Aims and scope of the journal article type.

The general structure of the approach is not clear, even in the figure which supposed to show the framework it is not clear and is ambiguous. The figure shows the approach as two disjoint systems with no connection.
I advise the authors to re-draw the structure of the proposed model and give a general demonstration in the beginning of the Material and Method section.
Even the details of components of the model are given dismantled.

Data preprocessing details should be added.

Evaluation models are adequately described, just some writing arrangements.

Sources are cited as appropriate.

Validity of the findings

Validity of the findings:-
This work provide a meaningful applications and results to the field of Arabic Dialect Identification.

Conclusions are well stated.


Conclusion should be extended to identify unresolved questions / limitations/ future directions.

Additional comments

These are general comments that authors advised to improve:-
1- For abbreviations you should double check for the duplicate definition of the abbreviation (highlighted in attached pdf).
2- The abbreviation DID is not the abbreviation of Arabic Dialect Identification!!, it is better to use first letter of full form i.e. ADI.
3- In abstract, better to mention the outstanding obtained results.
4- Only 4 sentences in Table 2, not 5 as mentioned in cation!
5- Figure 1 is not mentioned in the manuscript.
6- Caption of Table 2 inside the manuscript is "The dataset statistics include MADAR-26 (26 dialects), MADAR-6 (6 dialects), NADI (dialects from 21 Arab countries), and QADI (18 dialects). These datasets offer comprehensive resources for Arabic dialect identification."

While at the end of manuscript "Comparison of accuracy and macro-F1 metrics, with BERT scores from the original publication on the QADI dataset; other models use the same BERT baseline, while remaining models are our implementations".

7- The captions of table 2 and table 4 are interchanged.

8- Tables 2 and 3 are not mentioned in the manuscript!!!!!

9- A general structure of the paper in figure 1, should be clear and the components of the model should be connected. In addition, detailed description of the model should be added in the Materials and method section.

Annotated reviews are not available for download in order to protect the identity of reviewers who chose to remain anonymous.
Cite this review as

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.