Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on June 5th, 2025 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on July 23rd, 2025.
  • The first revision was submitted on August 12th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • A further revision was submitted on December 2nd, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on December 11th, 2025.

Version 0.3 (accepted)

· · Academic Editor

Accept

The paper may be accepted.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 4 ·

Basic reporting

The authors revised the manuscript adequately according to the reviewer comments.
The manuscript is now more qualified and clear.
I have no further comments.
I suggest accepting it for publication in its present form.

Experimental design

-

Validity of the findings

-

Additional comments

-

Version 0.2

· · Academic Editor

Major Revisions

The authors have failed to improve the manuscript.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 4 ·

Basic reporting

1- What are the running times (execution times, training times) of the methods?
Additional results may also be given in terms of execution times (computational cost).

2- Providing a table that summarizes the related work would increase the understandability of the differences from the previous studies in the "Related Works" section.

3- Tables 2, 3, and 4 should be explained and discussed in more detail.
The reasons for the results should also be discussed.
For example, why is it the best? Why is it increasing or decreasing?

4- The formula (equations) for metrics should be given.
per-class precision (CP), recall (CR), F1 score (CF1), and average overall precision (OP), recall (OR), and F1 score (OF1).

5- In the reference list, there is no any paper published in 2025.
There is only one paper published in 2024.
I suggest the authors cite the most recent papers (especially published in 2024 and 2025).

6- A concern is that no formal statistical analysis of the results is done, to indicate whether the differences in performance are statistically significant or not.
For example, Friedman Aligned Rank Test, Wilcoxon Test, Quade Test, etc.
p-value can be calculated and compared with the significance level (p-value < 0.05).

7- Some abbreviations are used in the text without giving their expansion.
For example, RNN, LDA, SRN, NUS-WIDE, GPT, VPT, etc.
The authors should specify that "these abbreviations stand for what."

In line 53, GCN appears for the first time, and the full name of the word should be placed here.
Line 116 - Graph Convolutional Network (GCN)

Some abbreviations are given more than once.
For example:
Line 63 - Vision Transformer (ViT)
Line 135 - Vision Transformer (ViT)
Line 144 - Vision Transformer (ViT)
Line 156 - Vision Transformer (ViT)

8- The organization of the paper (the structure of the manuscript) may be written at the end of the "Introduction" section.
For example: "Section 2 presents ... Section 3 gives ...."

Experimental design

.

Validity of the findings

.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 ·

Basic reporting

While existing multi-label classification methods primarily focus on capturing label co-occurrence patterns, they often fail to explore the global contextual information and the semantic hierarchy inherent in multi-label datasets, especially in large-scale label spaces, leading to suboptimal feature extraction. To address this limitation, a novel Topic-Aware Transformer with Hierarchical Prompting Learning (TATHPL) is proposed, which is capable of hierarchically integrating latent topic information to improve the performance of multi-label image classification tasks. These prompts are hierarchically inserted into specific transformer blocks, where the self-attention mechanism facilitates their influence on subsequent feature extraction. The paper is well written and orgnized.

Experimental design

Experiments are well designed.

Validity of the findings

The findings are also interesting

Additional comments

A small suggestion is that there are also related research contents in the theory of granular-ball computing. It is suggested to introduce the relevant work to enable readers to fully understand it.

Reviewer 2 ·

Basic reporting

See specific reports.

Experimental design

See specific reports.

Validity of the findings

See specific reports.

Additional comments

1. The abstract only mentions the mAP results on the three datasets, but does not state the specific improvement (e.g., relative percentage improvement) of these results compared with the existing methods, and it is suggested to add the quantitative analysis of the performance gain to enhance the persuasive power (e.g., “how much is the improvement compared with the optimal baseline method”).
2. In the introduction section, the statement “existing methods ignore the potential topic information in label combinations” lacks specific literature support, and it is suggested to illustrate its specific shortcomings in topic modeling by combining the examples of methods in the literature [43]-[49].
3. The topic extraction module (lines 156-174), does not explicitly state the basis for the selection of the number of topics L in the LDA model (e.g., why MS-COCO selects 2 and 6 topics), and it is suggested to add comparative experiments on the impact of different values of L on the performance and the guidelines for the selection.
4. The correspondence between the prompt block and the transformer block in Equation (3) is ambiguous, and it does not explain how to determine the “kth prompt injected into the ith block”, and it is suggested to explain it in detail with specific examples (e.g., the block selection of MS-COCO).
5. In the experimental setup (lines 230-238), the specific version of ViT pre-training weights (e.g., ViT-B/16) is not stated, and the performance of different visual backbone networks (e.g., ResNet) is not compared, so it is suggested to supplement the backbone network ablation experiment.
6. Some of the metrics (e.g., CF, OF) of the baseline methods (e.g., CNN-RNN) in Table 1 are missing, which affects the completeness of the side-by-side comparison, and it is suggested to supplement the complete data or explain the reasons for the missing.
7. In the part of ablation experiments (rows 269-273), Table 4 only compares the baseline and the complete model, and does not separately verify the role of “theme extraction” and “hierarchical cueing”, so it is suggested to add component-level ablation experiments to split the contribution of each module.
8. Regarding the limitations, only the sensitivity of theme distribution is mentioned, but no specific improvement direction (e.g., dynamic theme adjustment strategy) is proposed, and it is suggested to add targeted ideas for future optimization.
9. More related work on the generalization of deep learning should be cited and discussed.

Reviewer 3 ·

Basic reporting

This paper proposes a TATHPL for multi-label image classification, enhancing the model's understanding of label semantic hierarchies by incorporating topic information. The shortcomings are as follows:
1. Lack of distinct innovation. The article primarily utilizes a topic model to generate topic distributions and injects them into the Transformer Block of ViT through Prompt Learning. This method has similar attempts in existing work (e.g., TransHP).
2. Insufficient theoretical analysis. Although the paper introduces a "hierarchical prompt injection strategy," it does not provide rigorous theoretical analysis, such as why coarse-grained topics must be placed in shallow layers and fine-grained topics must be placed in deep layers. Relying solely on empirical explanations is inadequate to support the method's validity.
3. Inconsistent citing of figures, for example, on page 3, "As illustrated, our model comprises two main steps..." fails to specify the referenced figure; it is recommended to change it to "As shown in Figure 1 ...".
4. Recent comparative methods are missing; the comparison methods are outdated, especially Table 1 lacks the latest Transformer-based multi-label methods. It is suggested to include more ViT+Prompt methods that are structurally closer to this paper to highlight competitiveness.
5. Numerous formatting and writing errors exist, including missing section and formula numbering and a mixture of Chinese and English punctuation. For example, the Chinese colon “:” in the third paragraph on page 7, a missing space in line 4 on page 5, and an extra space in line 9 on page 8.
6. The construction and training mechanism of the Prompt Pool are not detailed. The paper does not clarify the initialization method of the prompts, whether parameters are shared, or whether the ViT backbone is frozen during the training phase, affecting reproducibility.

Experimental design

-

Validity of the findings

-

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.