Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on November 6th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on November 25th, 2025.
  • The first revision was submitted on December 4th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on December 10th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

Thank you for your resubmission after updating it in light of reviewers' comments. I am pleased to notify you that your manuscript is being recommended for publication. Thank you.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 ·

Basic reporting

The manuscript has been significantly improved in terms of methodological clarity, experimental validation, and presentation quality. The added explanations, quantitative analyses, and revisions strengthen the technical contribution and practical relevance of the work.

Overall, the study now meets the required standards for publication. I recommend the manuscript for acceptance.

Experimental design

The experimental design is well documented, and sufficiently detailed, with appropriate data preprocessing, evaluation metrics, and citations.

Validity of the findings

The results are valid, well supported by experiments, and the conclusions are clearly aligned with the reported findings while acknowledging limitations and future directions.

Additional comments

The manuscript is clearly written, technically sound, and suitable for publication in its current form.

Reviewer 2 ·

Basic reporting

The authors have already improved the article.

Experimental design

Changes are made as suggested.

Validity of the findings

yes, appropriate.

Version 0.1 (original submission)

· · Academic Editor

Minor Revisions

Dear authors,

Thank you for your submission. Based on the input from experts in the field, I would like to inform you that your manuscript is being returned for a couple of changes suggested by the experts, along with mine below. Therefore, please carefully revise and resubmit after incorporating these comments.

AE Comments:

• Enhance the multimodal fusion mechanism by incorporating cross-attention layers and leveraging pretrained foundation models to achieve deeper interaction between image, text, and temporal features.
• Strengthen model explainability through saliency maps, token-level attribution, and counterfactual reasoning to increase transparency and support audit and regulatory decision-making.
• Expand the generative augmentation strategy by integrating diffusion-based synthesis or conditional anomaly generation to capture more diverse and realistic fraud patterns.
• Improve robustness under distribution shifts by adopting continual learning, domain adaptation, and cross-system transfer evaluations to ensure stability across heterogeneous accounting environments.
• Improve the language of the paper

**PeerJ Staff Note**: Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

The manuscript presents a multimodal fusion generative adversarial network (MF-GAN) designed for accounting anomaly synthesis and detection. It offers a strong technical contribution by combining temporal–spatial representation learning with a dynamic-weight joint optimisation mechanism. The topic is timely and relevant, especially for intelligent financial supervision. However, the paper requires deeper theoretical justification, more rigorous evaluation, and clearer presentation of experimental reproducibility.
While MF-GAN integrates multimodal and joint optimisation concepts, the originality of its architecture relative to existing multimodal GANs (e.g., MAD-GAN, Tad-GAN, or CGAN variants) is not sufficiently emphasised. The authors should explicitly articulate which architectural elements are novel rather than incremental.
The dynamic-weight collaborative loss is presented as a core innovation, yet its mathematical formulation and adaptation mechanism are only briefly described. It remains unclear how the weighting coefficients are updated and whether they depend on the reconstruction–discrimination gradient ratio or an external adaptive function.
The manuscript lacks interpretive linkage between learned latent features and real-world accounting anomaly categories. Without visualisation or attribution analysis (e.g., SHAP, Grad-CAM), it is difficult to assess the practical diagnostic value of the model for auditors or regulators.
The figures are informative but require greater clarity and uniform scaling (especially Figures 5–8). Mathematical symbols should be consistently formatted (distinguishing between parameters θ, λ, and hyperparameters). Several long paragraphs in Section 3 could be structured more logically.

Experimental design

The ablation experiments (E1–E3) provide only qualitative conclusions. Quantitative results in tabular form (precision, recall, F1 for each variant) would better demonstrate the contribution of each module, particularly the joint optimisation mechanism.
The discussion of mode collapse and convergence is confined to the Limitations section. The authors should include empirical indicators (e.g., loss trajectories, gradient norms, Wasserstein distance curves) to substantiate the claim that the proposed gradient constraint enhances stability.
The hardware setup is specified, but no measurements of training time, memory footprint, or inference latency are provided. Since real-world accounting systems often operate under time constraints, scalability evaluation is essential.

Validity of the findings

While MF-GAN integrates multimodal and joint optimisation concepts, the originality of its architecture relative to existing multimodal GANs (e.g., MAD-GAN, Tad-GAN, or CGAN variants) is not sufficiently emphasised. The authors should explicitly articulate which architectural elements are novel rather than incremental.
The dynamic-weight collaborative loss is presented as a core innovation, yet its mathematical formulation and adaptation mechanism are only briefly described. It remains unclear how the weighting coefficients are updated and whether they depend on the reconstruction–discrimination gradient ratio or an external adaptive function.
The manuscript lacks interpretive linkage between learned latent features and real-world accounting anomaly categories. Without visualisation or attribution analysis (e.g., SHAP, Grad-CAM), it is difficult to assess the practical diagnostic value of the model for auditors or regulators.

Additional comments

The figures are informative but require greater clarity and uniform scaling (especially Figures 5–8). Mathematical symbols should be consistently formatted (distinguishing between parameters θ, λ, and hyperparameters). Several long paragraphs in Section 3 could be structured more logically.

Reviewer 2 ·

Basic reporting

This paper addresses an important and emerging topic—GAN-based anomaly detection for complex, high-dimensional accounting data. The methodological framework is ambitious and well-structured, combining residual temporal learning and spatial feature extraction. Nonetheless, several conceptual, empirical, and structural weaknesses should be addressed before publication.
- The introduction effectively presents the challenges in accounting anomaly detection but fails to justify why GANs are more suitable than other self-supervised or transformer-based models recently used in tabular financial data modeling. A comparative theoretical rationale is necessary.
- The description of the regional residual learning module is too abstract. Key parameters such as filter size, feature dimension, stride, and activation functions should be clearly stated. A pseudo-code block or algorithmic outline would improve reproducibility.

Experimental design

- The paper claims multimodal fusion but does not detail whether fusion occurs via concatenation, cross-attention, or shared latent projection. The absence of ablation on fusion type leaves uncertainty about the optimal integration mechanism.
- Although several baselines (PCA, KNN, LSTM-AE, MAD-GAN, Tad-GAN) are tested, their hyperparameter settings and training epochs are not reported. Without consistent tuning, performance comparisons may not be fair. Reproducing results using official implementations would improve validity.
- The datasets are controlled and balanced via resampling, but robustness under realistic noisy or cross-period conditions is not analyzed. The authors could simulate distributional drift or label noise to test stability under imperfect conditions.
- The loss terms employ gradient penalties and Wasserstein distances, yet the authors do not quantify how these design choices improve generation quality. Visualizing discriminator decision boundaries or showing FID-like metrics could make the improvement tangible.

Validity of the findings

- The model involves complex dual-branch generators and large convolutional layers. Without reporting FLOPs, parameter counts, or convergence rates, it is impossible to assess efficiency compared to simpler baselines.

Additional comments

- The manuscript is well-written overall, but minor grammatical inconsistencies remain (e.g., missing articles, pluralization errors). Equations lack consistent numbering and some variables are undefined upon first appearance (in Section 3.3). Revising for clarity and formatting uniformity is recommended.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.