Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on August 19th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on September 15th, 2025.
  • The first revision was submitted on October 1st, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on October 7th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

The authors have addressed all of the reviewers' comments. Based on reviewers' recommendations and my own assessment, this manuscript can be accepted for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

The manuscript meets the journal standards.

Experimental design

The authors have responded well to my comments, and I am satisfied with the changes.

Validity of the findings

The authors have added the missing details on model hyperparameters and included additional experiments with another gradient boosting algorithm. I believe the paper is now suitable for acceptance.

Additional comments

No comment.

Reviewer 2 ·

Basic reporting

The revised version is well-organized and clearly written. I don't have any other comments.

Experimental design

I have no further comments.

Validity of the findings

I have no further comments.

Additional comments

No further comments.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

Please revise the manuscript to address the comments from the reviewers. Consider comparing the proposed model with CatBoost and AutoInt as suggested.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

Overall, the manuscript is well-structured and relatively clear. The methods are described in sufficient detail, enabling readers to understand how the proposed model is implemented and evaluated. The experimental setup is reasonably well designed, covering key aspects such as data preprocessing and model evaluation against both baseline and advanced models. The results are presented in an organized and accessible manner, with tables and figures clearly described and easy to follow.

Experimental design

The points below need to be clarified:
- The manuscript reports that Random Search was applied for hyperparameter tuning across the machine learning models, yet it does not specify key details such as the number of parameter combinations explored, the total iterations conducted, or the exact tuning procedure for each model.
- For the Advanced Tabular Models, almost no information on hyperparameter tuning is provided. The authors should explicitly clarify whether these models were tuned, and if so, which parameters were selected.

Validity of the findings

- It seems that the reported results for TabTransformer is incomplete. If so, please update with the full set of results.
- CatBoost is widely recognized as a strong model for tabular data, yet it is not included in the experiments. The authors may want to provide CatBoost results on the dataset to strengthen the benchmarking and provide a more comprehensive evaluation of the proposed framework.

Additional comments

The points listed above, although they might be quite minor, still need to be addressed,

Reviewer 2 ·

Basic reporting

The paper is written in clear and professional English. The introduction provides a relatively comprehensive overview and sets the context for predicting loan repayment capacity. However, several aspects require improvement. Although the introduction mentions multiple prior studies related to repayment ability, it does not convincingly identify the research gap that this paper aims to address. This weakens the persuasiveness of the scientific contribution. Furthermore, the practical significance of the study should be emphasized more strongly, for example, its application value for commercial banks, credit institutions, and financial risk management in real-world contexts.

Experimental design

The paper presents a series of experiments, including comparative studies with machine learning models and advanced tabular models approaches on loan repayment capacity data. However, the experimental design lacks ablation studies that analyze the impact of individual components within the proposed model architecture, which are essential for verifying the actual contribution of each module. In Table 3, the authors report only mean values, but it is necessary to include the standard deviation to demonstrate the stability and reliability of the model. In addition, an important baseline MLP for tabular data tasks is missing from the experiments. Including MLP would enhance the comprehensiveness and fairness of the evaluation. Finally, the “future research” section remains rather shallow; it should place greater emphasis on the current model’s limitations to provide clearer directions for subsequent research.

Validity of the findings

The comparison in this study primarily focuses on Transformer-based architectures and attention mechanisms. However, it would strengthen the validity of the findings if the authors also included results from AutoInt (https://arxiv.org/abs/1810.11921), which has demonstrated competitive performance on tabular data and is likewise grounded in self-attention. Moreover, the absence of some computational efficiency metrics (e.g., training time, inference latency, memory usage) makes it difficult to validate whether the proposed method is practical for real-world deployment.

Additional comments

The bolded values in Tables 1, 2, and 3 should be explicitly explained, as it is not immediately clear whether they represent best performance, statistical significance, or another criterion.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.