Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Multimodal image fusion for enhanced vehicle identification in intelligent transport

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on March 28th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on August 1st, 2025.
The first revision was submitted on August 22nd, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on September 15th, 2025.

Version 0.2 (accepted)

Consolato Sergi · Sep 15, 2025 · Academic Editor

Accept

Congratulations on your valuable contribution to our journal.

[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]

Javier Sánchez-Soriano · Sep 8, 2025

Basic reporting

no comment

Experimental design

no comment

Validity of the findings

no comment

Additional comments

I am satisfied with the new version of this manuscript and my advice is this paper can be accepted.

Cite this review as

Sánchez-Soriano J (2025) Peer Review #1 of "Multimodal image fusion for enhanced vehicle identification in intelligent transport (v0.2)". PeerJ Computer Science

Reviewer 2 · Sep 4, 2025

Basic reporting

Address all the basic concepts that are necessary for that article.

Experimental design

The experimental design is now clear and follows the aim of the journal.

Validity of the findings

This articles have suitable novelty that is necessary for publication and dominance.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Multimodal image fusion for enhanced vehicle identification in intelligent transport (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Aug 22, 2025

Version 0.1 (original submission)

Consolato Sergi · Aug 1, 2025 · Academic Editor

Major Revisions

Please follow all requests and address all criticisms thoroughly.

**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Javier Sánchez-Soriano · Apr 28, 2025

Basic reporting

- It is recommended that all techniques, acronyms, and methods used be defined at first use to facilitate reading, especially for readers who are not completely familiar with all terms (e.g., briefly explain acronyms such as HOG, BRISK, ViT). As an example, the first occurrence of HOG is on line 40, but is nevertheless developed on line 106.

- Although the references are adequate and relevant, the state of the art could be strengthened by including some very recent (last 1-2 years) citations on multimodal sensing or ViT applications in airborne vision, to place the work even better within the current context of the field.

Experimental design

- The paper uses a combination of advanced techniques (YOLOv4, Vision Transformer with HOG and BRISK, ResNet-18). It is recommended to include a brief discussion justifying why these specific architectures were selected over other possible alternatives.

- It would also be valuable to indicate whether other architectures or approaches were evaluated, and what the decision criteria were for opting for the chosen models.

- Table 1 provides the values of the key hyperparameters. It would be useful to briefly explain whether any systematic search for hyperparameters was performed (e.g., grid search, random search, Bayesian optimization) or whether the values were established manually based on previous work. This would strengthen the perception of rigor in the experimental design.

Validity of the findings

- The paper shows that their approach is better than other methods for detecting and classifying vehicles in aerial images, but it doesn't talk about how new this approach is or what the impact might be. It is suggested to add a section or subsection within the Discussion or Conclusion that clearly highlights how this approach is new (e.g., combining depth map generation with attention, guided filtering, YOLOv4, and HOG/BRISK in ViT) and discusses how it can be used in practice, for example, for traffic surveillance or road safety.

Additional comments

- Although the cases where the model fails (e.g., truncation and occlusion of vehicles) are identified, this section could be enriched by further characterizing these errors (are they more frequent in certain classes of vehicles, flight heights, camera angles, light conditions, etc.) and proposing specific ideas to mitigate them in future works.

- It would be convenient to review the uniformity in the format of the figures (same decimal notation, same color coding in similar graphs, etc.) to improve the visual presentation of the article.

Cite this review as

Sánchez-Soriano J (2025) Peer Review #1 of "Multimodal image fusion for enhanced vehicle identification in intelligent transport (v0.1)". PeerJ Computer Science

Reviewer 2 · Jul 23, 2025

Basic reporting

• Clear English is used, but minor grammatical errors exist (such as "In there work" to "In their work," ).
• Introduction adequately motivates the study but could better highlight gaps addressed by the proposed fusion method.
• Structure aligns with PeerJ standards, though the "Related Work" section lacks depth in multi-modal fusion.

Experimental design

• Methods are described with sufficient detail for replication, including preprocessing (bilateral filter, gamma correction), depth generation (encoder-decoder + attention), fusion (guided filtering), and feature extraction (ViT + HOG/BRISK).

• Hyperparameters are well-documented (Table 1), and datasets are publicly accessible.

• Evaluation metrics (precision, recall, F1-score, confusion matrices) are appropriate.

• Computational infrastructure (Intel i3, 4GB RAM) is inadequate for training ViT/YOLOv4. Clarify if GPUs were used, as this impacts reproducibility and real-time claims.

• Depth map generation lacks training details (e.g., loss function, dataset). Specify if depth maps were synthetically generated or used as ground truth.

Validity of the findings

General Comments

• Fig. 1: Simplify the workflow diagram to clarify data flow (e.g., depth generation → fusion → detection).

• Table 5: Include missing comparisons (e.g., VAID for He et al. 2021) and standardize reference formatting ("Lin et al." → "Lin et al.").

• Preprocessing: Specify parameters for bilateral filtering (σ) and gamma correction (γ).

• Feature Extraction: Justify HOG/BRISK integration in ViT (e.g., how they enhance local features).

• Class Imbalance: Report per-class precision/recall to address misclassifications (e.g., AU-AIR "Motorbike" to "Cycle").

• Grammar: Fix tense inconsistencies (e.g., "We transferred" to "We transfer," ).

References: Ensure all citations match the reference list (e.g., "Ahmed et al. 2024" in text vs. "Ahmed and Jalal 2024" in Refs).

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Multimodal image fusion for enhanced vehicle identification in intelligent transport (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted Mar 28, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Multimodal image fusion for enhanced vehicle identification in intelligent transport

Summary

Version 0.2 (accepted)

Consolato Sergi · Sep 15, 2025 · Academic Editor

Javier Sánchez-Soriano · Sep 8, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Sep 4, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Consolato Sergi · Aug 1, 2025 · Academic Editor

Javier Sánchez-Soriano · Apr 28, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jul 23, 2025

Basic reporting

Experimental design

Validity of the findings

Review History
Multimodal image fusion for enhanced vehicle identification in intelligent transport