All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The authors have addressed all of the reviewers' comments. Based on reviewers' recommendations and my own assessment, this manuscript can be accepted for publication.
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]
The manuscript meets the journal standards.
The authors have responded well to my comments, and I am satisfied with the changes.
The authors have added the missing details on model hyperparameters and included additional experiments with another gradient boosting algorithm. I believe the paper is now suitable for acceptance.
No comment.
The revised version is well-organized and clearly written. I don't have any other comments.
I have no further comments.
I have no further comments.
No further comments.
Please revise the manuscript to address the comments from the reviewers. Consider comparing the proposed model with CatBoost and AutoInt as suggested.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
Overall, the manuscript is well-structured and relatively clear. The methods are described in sufficient detail, enabling readers to understand how the proposed model is implemented and evaluated. The experimental setup is reasonably well designed, covering key aspects such as data preprocessing and model evaluation against both baseline and advanced models. The results are presented in an organized and accessible manner, with tables and figures clearly described and easy to follow.
The points below need to be clarified:
- The manuscript reports that Random Search was applied for hyperparameter tuning across the machine learning models, yet it does not specify key details such as the number of parameter combinations explored, the total iterations conducted, or the exact tuning procedure for each model.
- For the Advanced Tabular Models, almost no information on hyperparameter tuning is provided. The authors should explicitly clarify whether these models were tuned, and if so, which parameters were selected.
- It seems that the reported results for TabTransformer is incomplete. If so, please update with the full set of results.
- CatBoost is widely recognized as a strong model for tabular data, yet it is not included in the experiments. The authors may want to provide CatBoost results on the dataset to strengthen the benchmarking and provide a more comprehensive evaluation of the proposed framework.
The points listed above, although they might be quite minor, still need to be addressed,
The paper is written in clear and professional English. The introduction provides a relatively comprehensive overview and sets the context for predicting loan repayment capacity. However, several aspects require improvement. Although the introduction mentions multiple prior studies related to repayment ability, it does not convincingly identify the research gap that this paper aims to address. This weakens the persuasiveness of the scientific contribution. Furthermore, the practical significance of the study should be emphasized more strongly, for example, its application value for commercial banks, credit institutions, and financial risk management in real-world contexts.
The paper presents a series of experiments, including comparative studies with machine learning models and advanced tabular models approaches on loan repayment capacity data. However, the experimental design lacks ablation studies that analyze the impact of individual components within the proposed model architecture, which are essential for verifying the actual contribution of each module. In Table 3, the authors report only mean values, but it is necessary to include the standard deviation to demonstrate the stability and reliability of the model. In addition, an important baseline MLP for tabular data tasks is missing from the experiments. Including MLP would enhance the comprehensiveness and fairness of the evaluation. Finally, the “future research” section remains rather shallow; it should place greater emphasis on the current model’s limitations to provide clearer directions for subsequent research.
The comparison in this study primarily focuses on Transformer-based architectures and attention mechanisms. However, it would strengthen the validity of the findings if the authors also included results from AutoInt (https://arxiv.org/abs/1810.11921), which has demonstrated competitive performance on tabular data and is likewise grounded in self-attention. Moreover, the absence of some computational efficiency metrics (e.g., training time, inference latency, memory usage) makes it difficult to validate whether the proposed method is practical for real-world deployment.
The bolded values in Tables 1, 2, and 3 should be explicitly explained, as it is not immediately clear whether they represent best performance, statistical significance, or another criterion.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.