All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I have assessed the revised manuscript and find that all comments were appropriately addressed. Thanks and best wishes.
[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Section Editor covering this Section #]
The authors extended the paper with reviewers' comments.
The necessary additions were made.
Experimental results are clearer now. Research is more structured.
Novelty and impact are clearly defined.
Dataset descriptions were added.
No new additional issues were found.
The updated manuscript exhibits a significantly enhanced clarity and refinement in comparison to the earlier version. The authors have made improvements to the structure, clarified the introduction and literature review, and guaranteed that all figures and tables meet high-quality standards. The English writing is professional and readily comprehensible. The references are sufficient and current, and the methodological details are now presented in a manner that is easy to understand.
The research utilizes a clearly articulated research design that is suitable for machine-learning-based classification within the realm of educational data mining.
The authors have included significant information concerning the sampling procedure, such as the timeframe and the social media platforms utilized. They also elaborated on feature selection through Recursive Feature Elimination (RFE), provided justification for the algorithms used, and discussed data balancing techniques involving SMOTE-ENN.
The methodological reasoning is robust, and the design allows for reproducibility. Furthermore, the authors sufficiently explain the rationale behind the inapplicability of certain statistical tests, particularly in light of the single 80/20 data split.
The authors provide compelling evidence supporting the validity of the model. Concerns regarding overfitting are effectively addressed through suitable statistical justification, which includes an EPV of 40.6, Wilson confidence intervals ranging from 96.3% to 98.4%, and consistent performance trends observed across 12 different algorithms. The discussion surrounding limitations, generalizability, and ethical considerations is comprehensive and conducted in a professional manner. The conclusions drawn are well-supported by the results and do not exaggerate the findings.
The authors have evidently addressed the feedback provided by the reviewers and have significantly enhanced the manuscript. The clarity, methodological transparency, and contextualization of the findings have all been improved. I do not identify any major issues that persist. The manuscript is well-prepared and appropriate for publication.
Dear Author,
Thank you for submitting your paper.
After carefully reviewing the comments from all reviewers, I would like to invite you to submit a revised version of your manuscript. Additionally, I would like to help improve your work with the following important suggestions:
1. Does this paper solve a regression problem or a classification problem? Can this be clarified?
2. What is the name of the target Saudi university from which the dataset was collected?
3. What types of variables were included in the dataset, what scale of measurement was used, and what sampling approach was applied?
4. Which social media platform was used to collect the samples?
5. How was sample adequacy and reliability assessed?
6. Why are there no statistics for dimensions and factors in Table 1?
7. Can comparative tables be created to show the important features identified by ANOVA F-test, Chi-square test, Recursive Feature Elimination (RFE), and Mutual Information (MI)? Could these be added to the Feature Selection section?
8. Were statistical tests applied to compare the performance of the 12 algorithms? If not, can these be added to strengthen the results? Can Figure 1 be updated accordingly?
9. For Tables 2–9, can accuracy values be reported with 95% confidence intervals (CI) and standard deviation (SD ±) to demonstrate result stability?
10. Can the Discussion section be expanded to include more details and comparisons between the study’s results and those reported in the existing literature?
11. In the Conclusion, has the soundness of the KNN model been verified through a final comparison of all algorithms using statistical tests (as suggested in point 7)?
12. What are the limitations of the study in terms of sample size, data collection platforms, result generalizability, and applied techniques?
13. What guidelines and recommendations can be provided for future work?
14. Language issues can also be solved.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The Academic Editor has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
Professional English is used throughout.
Literature references are valid.
Raw data shared.
The number of rows corresponds to the number of participants' responses.
Some pictures are of low quality, should be provided pictures with higher resolution.
1) In the Introduction, provide more details on the specific contributions and novelty of this paper compared to existing literature.
2) For what period was the interview data taken? (2024? 2025?)
3) Should be given more statistical descriptions of the investigated region on how student dropout has changed over the years in those universities.
4) Provide more specifics on assumptions made. Which k value was optimal for your kNN, and which methods were used to choose the optimal k value for your research?
5) Expand more on the reasons why those 12 algorithms were used? Compare, contrast them for your aim. Why not, for example, C-means is suited well for you?
6) In Modeling and Evaluation, you varied the number of features from 16-22. Expand more on feature engineering, which features were more and less significant, which were more and less prior for the research in your case.
Unclear purposes for using the results. Expand on the use cases of the received results. Expand on the practical implications of the results. How does it affect the applicant? Does research mean that universities would not accept an applicant if a man has a set of features that predict his possible dropout? The research should not violate human rights to education. The research should not discriminate against any person who is willing to get an education.
The manuscript is written in professional, clear English and generally easy to follow.
The introduction establishes the global and Saudi-specific context well, citing relevant and recent literature. However, the related work section at times reads as a listing of studies rather than a critical synthesis. Strengthening the comparative discussion would improve clarity.
Figures and tables are relevant and support the narrative. Resolution for plots (e.g., SHAP/LIME visualizations) should be increased for readability.
References are appropriate, up-to-date, and include both foundational (e.g., SHAP, LIME) and recent regional studies.
The paper is self-contained and presents sufficient context for replication.
The research questions (RQ1–RQ3) are clear, relevant, and aligned with the aims of the journal.
The study design is appropriate, beginning with a pilot study and moving to a full questionnaire dataset (n = 4,561) across Saudi universities. Ethical approval and informed consent are clearly reported.
Feature selection and modeling methods are diverse and appropriate: F-statistic, chi-square, RFE, and MI combined with twelve machine learning models.
Class imbalance was handled with SMOTE-ENN, which is justified and well described.
One concern is the sampling strategy: reliance on self-reported survey data distributed via social media may introduce sampling bias. This should be acknowledged more explicitly as a limitation.
The choice of KNN as the top-performing model (97.6% accuracy) is notable, as ensemble methods often outperform KNN in similar studies. Further detail on validation (e.g., k-fold cross-validation, train/test splits, avoidance of data leakage) is essential to confirm robustness.
Results are presented clearly with comparisons across feature selection strategies and models.
The very high accuracy and F1-scores (>97%) warrant scrutiny. These numbers are unusually high for dropout prediction tasks and raise questions of overfitting or potential data leakage. The authors should expand on validation procedures and report whether hyperparameter tuning and cross-validation were applied consistently.
XAI results (SHAP and LIME) are well-integrated and provide interpretable insights into dropout risk factors. The use of the LEAF framework is a strength, as it allows quantitative comparison of explanation quality.
Findings are consistent with prior research (e.g., GPA, academic year, family/friend support, and employment status as key predictors) and extend knowledge by emphasizing sociocultural factors.
Limitations are acknowledged (dataset size, computational cost of SHAP), but the discussion of generalizability beyond Saudi Arabia could be expanded.
Strengths:
Large and diverse dataset covering academic, personal, and sociocultural factors.
Rigorous methodological design, combining multiple ML models with XAI.
Ethical approval and informed consent appropriately documented.
Practical implications for Saudi universities are well articulated.
Suggestions for Improvement:
Provide additional detail on model validation to address concerns of overfitting.
Expand the discussion of generalizability to non-Saudi contexts and how sociocultural factors may vary internationally.
Refine some figures and captions for clarity and readability.
Condense overly long sentences in the methodology section to improve readability.
Include confidence intervals or statistical tests to strengthen claims of model superiority.
Brief Summary
This research addresses student dropout prediction in Saudi universities using machine learning and explainable AI techniques. The authors collected data from 4,561 students across multiple institutions and developed predictive models incorporating academic, personal, and sociocultural factors. The K-Nearest Neighbors (KNN) model with Recursive Feature Elimination achieved 97.6% accuracy, with GPA, academic year, and employment status identified as key dropout predictors. The study also evaluated exploitability methods using the LEAF framework, finding LIME outperformed SHAP across all metrics.
Study Design and Methodology Strengths
The research addresses a significant problem with substantial educational impact, as evidenced by the 40-50% dropout rate in Saudi higher education institutions. The comprehensive methodology incorporating three dimensions (personal, academic, sociocultural) represents a valuable contribution to understanding dropout complexity beyond traditional academic-only approaches.
The authors demonstrate methodological rigor through their systematic approach: pilot study refinement, appropriate preprocessing with SMOTE-ENN for class imbalance, comparison of four feature selection techniques, and evaluation of twelve ML models. The inclusion of explainable AI evaluation using the LEAF framework adds methodological sophistication often missing in educational ML studies.
References and Citations
The references are generally appropriate and recent, though some key educational ML studies appear to be missing. The citation format is consistent, but several references lack complete page numbers or DOI information.
Minor Improvements:
1. Figure Quality: Improve resolution and readability of SHAP/LIME visualizations (Figures 4-13)
2. Literature Review: Include more recent studies on educational ML applications
3. Discussion Depth: Expand discussion of practical implications for university administrators
Areas of Concern
Sampling and Generalizability Issues: The data collection methodology lacks sufficient detail regarding sampling strategy, representativeness across universities, and potential selection bias. With responses collected through "social media platforms and collaboration with several Saudi universities," the study may suffer from convenience sampling bias that limits generalizability. The authors should provide detailed demographic breakdown by university, faculty, and program type.
Statistical Rigor: The paper lacks essential statistical analyses including confidence intervals, statistical significance testing for model comparisons, and cross-validation procedures. The exceptionally high accuracy of 97.6% raises concerns about potential overfitting, particularly given the class imbalance (77.5% non-dropout vs. 22.5% dropout) even after SMOTE-ENN application.
Methodological Transparency: Further methodological clarity is needed regarding train/test splits, cross-validation procedures, and hyperparameter tuning. The authors mention using default parameters but don't specify which parameters or justify this choice, which impacts reproducibility.
Specific Comments
Data Collection and Preprocessing (Lines 173-190)
• Line 177: Specify the exact sampling methodology and university selection criteria
• Lines 184-189: Provide detailed justification for SMOTE-ENN parameter choices and comparison with alternative balancing techniques
• Figure 2-3: Include statistical tests comparing class distributions before and after balancing
Experimental Design (Lines 295-315)
• Lines 296-299: Clarify the exact procedure for feature subset selection - was this done randomly or systematically?
• Table 2-9: Include confidence intervals and statistical significance tests for model comparisons
• Lines 314-315: The statement about KNN achieving "best performance" needs statistical support
Technical Soundness Assessment
Strengths:
• Appropriate use of multiple feature selection techniques
• Comprehensive model comparison across diverse algorithm families
• Novel application of LEAF framework for XAI evaluation
• Reasonable preprocessing pipeline with duplicate removal and categorical encoding
Weaknesses:
• Insufficient cross-validation procedures
• Lack of statistical significance testing
• Potential overfitting indicated by unusually high accuracy
• Missing details on hyperparameter optimization
Major Revisions Required:
1. Enhanced Statistical Rigor: Implement proper cross-validation, provide confidence intervals, and conduct statistical significance tests for all model comparisons. Provide additional validations to confirm the robustness of the results.
2. Sampling Methodology: Provide detailed description of university selection, response rates by institution, and potential bias assessment
3. Reproducibility: Include complete hyperparameter specifications, train/test split procedures, and random seed information
4. External Validation: Discuss generalizability limitations and provide framework for adapting findings to other contexts
Validity of Findings
Model Performance Claims: The claimed superior performance of KNN with RFE achieving 97.6% accuracy requires more rigorous statistical validation. The comparison lacks statistical significance testing, and the performance difference between top models may not be meaningful. Additionally, the choice of accuracy as the primary metric may be inappropriate for imbalanced datasets, even after resampling.
Feature Importance Interpretation: While the identification of GPA, academic year, and employment status as key factors aligns with existing literature, the interpretability analysis could be strengthened. The SHAP and LIME explanations show some discrepancies in feature rankings that deserve more thorough discussion.
External Validity: The study's generalizability beyond Saudi universities is questionable due to cultural and institutional specificity. The authors should discuss limitations more thoroughly and provide guidance for adapting findings to other educational contexts.
Specific Comments
Results Interpretation (Lines 374-411)
• Tables 10-13: Explain the substantial differences between SHAP and LIME feature importance rankings
• Lines 427-431: The discussion of GPA as the "most influential factor" oversimplifies complex relationships shown in the data
LEAF Evaluation (Lines 412-423)
• Table 15: Provide statistical significance tests for LIME vs. SHAP performance differences
• Computational efficiency comparison (0.62s vs. 1.86s) lacks context about dataset size and hardware specifications
Overall Recommendation: Major Revisions
This research addresses an important educational challenge with a comprehensive methodological approach. However, significant concerns regarding statistical rigor, sampling methodology, and result validation prevent acceptance in its current form. The exceptionally high reported accuracy requires more thorough validation, and methodological transparency needs substantial improvement for reproducibility.
The study has potential to make meaningful contributions to educational ML literature once these fundamental issues are addressed. The authors should focus particularly on enhancing statistical validation, providing methodological transparency, and tempering performance claims with appropriate statistical context.
Ratings
* Originality/Novelty : Average
* Significance of Content : Average
* Quality of Presentation: Average
* Scientific Soundness: Low
* Interest to Readers: Average
* Overall Merit: Low
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.