All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I believe the authors have addressed all concerns and identified several limitations for this work. We acknowledge the enormous work the author provided and the additional contributions made. Congratulations!
[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]
I have no further comments.
I have no further comments.
I have no further comments.
I have no further comments.
The manuscript is up to the mark now
Experimetal design has been updated
Novelty and impact has been addressed
Only two datasets (CSIQ and TID2013) are used. These are established but aging benchmarks. Consider testing on a third, more modern dataset (e.g., KADID-10k or LIVE) to demonstrate generalization.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
Cross database analysis should be done.
no comment
no comment
The manuscript proposes IQA-GL, an image quality assessment (IQA) algorithm that integrates global and local features using a dual-branch architecture with Vision Transformer (ViT) and ResNet-50, enhanced by deformable convolution, bilinear pooling, and a gated recurrent unit (GRU)-based quality aggregation module. The model is evaluated on CSIQ, TID2013, and KADID-10k datasets, demonstrating superior performance in PLCC and SRCC metrics compared to state-of-the-art (SOTA) methods. Below, I outline some of my concerns
1. The manuscript contains several grammatical errors. Examples include:
a. Page 6, Line 99: “the human visual is highly sensitive” lacks a noun (e.g., “system”).
b. Inconsistent terminology, such as “image block” vs. “image patch” (e.g., Page 5, Line 10 vs. Page 11, Line 251).
2. The Introduction surveys IQA methods but lacks a critical analysis of recent deep learning-based approaches combining global and local features. It does not discuss their specific limitations or position IQA-GL’s contributions clearly. For example, the claim that existing methods “simply concatenate or stack” features (Page 6, Line 108) lacks supporting evidence.
a. Expand the Introduction to compare recent hybrid IQA methods (e.g., CNN-Transformer hybrids) and their limitations, such as feature redundancy or poor spatial dependency modeling, as discussed in Ke et al. (2021).
b. Articulate how IQA-GL’s bilinear pooling and GRU-based aggregation address these gaps, referencing specific works like Lao et al. (2022a).
c. Add a table summarizing SOTA methods, their feature extraction strategies, and limitations to contextualize IQA-GL’s novelty, as recommended by Wang et al. (2020).
i. Ke, J., Wang, Q., Wang, Y., Milanfar, P., & Yang, F. (2021). MUSIQ: Multi-scale image quality transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5148–5157.
ii. Lao, S., Gong, Y., Shi, S., Yang, S., Wu, T., Wang, J., Xia, W., & Yang, Y. (2022a). Attentions help CNNs see better: Attention-based hybrid image quality assessment network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1140–1149.
iii. Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2020). Multiscale structural similarity for image quality assessment. IEEE Transactions on Image Processing, 13(11), 1688–1701.
3. The ablation studies in Tables 4–6 on Page 17 assess global/local features, fusion modes, and submodules but lack depth. For example, ViT patch sizes, ResNet variants, and hyperparameter sensitivity are not explored. Conduct additional ablation experiments to evaluate:
a. ViT patch sizes (e.g., 8x8 vs. 16x16)
b. ResNet variants (e.g., ResNet-18, ResNet-101),
c. Alternative fusion methods (e.g., cross-attention),
d. Hyperparameter sensitivity (e.g., learning rate, batch size)
e. Present results in a table or figure to quantify contributions to PLCC/SRCC.
f. Discuss performance vs. computational cost trade-offs for each variant.
g. The manuscript evaluates IQA-GL on three datasets but does not test cross-dataset generalization (e.g., training on TID2013, testing on KADID-10k), limiting insights into robustness across unseen distortions. Perform cross-dataset experiments and report PLCC/SRCC.
4. The manuscript claims strong adaptability and robustness on Page 16 and Line 37 but only briefly mentions limitations with extreme distortions and unseen real-world degradations on Page 18 and Line 467. Specific failure cases or distortion types are not analyzed.
5.
Only two datasets (CSIQ and TID2013) are used. These are established but aging benchmarks. Consider testing on a third, more modern dataset (e.g., KADID-10k or LIVE) to demonstrate generalization.
The manuscript presents a technically strong contribution to full-reference image quality assessment. While the methodological novelty is incremental, it is well-supported through careful ablation and comparisons.
Only two datasets (CSIQ and TID2013) are used. These are established but aging benchmarks. Consider testing on a third, more modern dataset (e.g., KADID-10k or LIVE) to demonstrate generalization.
Generalization is not convincingly shown; there’s no mention of cross-dataset testing (e.g., train on CSIQ, test on TID2013). Add or propose cross-dataset results to support claims of robustness.
minor revision is required
Please follow the requests and comments of the reviewers in detail!
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
The authors propose an image quality assessment algorithm based on global-local feature fusion (IQA-GL). The idea is novel, and experiments show the proposed method achieved the best performance among different datasets, but the writing is not good enough. The comments are listed below.
1. Line 68, “Yan Yan et al. (2021) proposed…”, the author should delete one Yan.
2. The author should check all the XXX et al. proposed, delete the duplicated surname.
3. The author only cited 32 references; it is better to cite about 50 references.
4. In Figure 2, after the figure caption, the authors should make a brief introduction of the framework.
The authors should check all the figures and add a brief introduction after each caption.
It is better to locate all the equations in the middle of the line.
Clarify the rationale for selecting ResNet-50 and ViT-B-8 as backbones, comparing their performance to other potential architectures (e.g., EfficientNet, Swin Transformer).
1. Include a brief description of the dataset preprocessing steps and metadata in the "Data Availability" section to enhance transparency.
2. Provide a table or section detailing all hyperparameters (e.g., learning rate, batch size, ResNet-50, and ViT-B-8 configurations) to ensure reproducibility.
3. Justify the choice of 250 training iterations and the cosine annealing scheduler, possibly with a convergence plot or validation loss curve.
4. The authors acknowledge limitations in handling extreme distortions and unseen real-world degradations (Page 15), which is commendable. However, the manuscript lacks a discussion on computational complexity or inference time, which is critical for practical IQA applications.
5. Clarify the 5%-8% improvement claim by explicitly stating the metrics and algorithms compared in the Abstract and Conclusion.
6. Include a discussion on computational complexity (e.g., FLOPs, inference time) and compare it with baseline methods to assess practical applicability.
7. Expand the ablation study to include the impact of individual sub-modules (e.g., deformable convolution, GRU) beyond fusion strategies.
8. The lack of computational complexity analysis limits the understanding of the method's practical feasibility.
9. Discuss the trade-offs between performance and computational cost to highlight practical applicability.
Good
The manuscript is written in clear and professional English, making it accessible.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.