Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on October 24th, 2024 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on January 17th, 2025.
  • The first revision was submitted on February 19th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on March 25th, 2025.

Version 0.2 (accepted)

· Mar 25, 2025 · Academic Editor

Accept

Revised manuscript is reviewed by the reviewers. They have agreed to the corrections carried out.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

·

Basic reporting

The revision aligns with my previous feedback. I have no further comments.

Experimental design

The revision aligns with my previous feedback. I have no further comments.

Validity of the findings

no comment

Additional comments

The revision aligns with my previous feedback. I have no further comments.

·

Basic reporting

No comment.

Experimental design

No comment.

Validity of the findings

No comment.

Additional comments

Dear authors,
After your thorough review, the article meets the standards in all areas, so I have no suggestions for improvement.
Thank you for your efforts.

Version 0.1 (original submission)

· Jan 17, 2025 · Academic Editor

Major Revisions

Please respond in detail to all the comments from the reviewers.

·

Basic reporting

The paper focuses on developing a vehicle surveillance system using aerial images, employing a combination of fuzzy C-means (FCM) clustering for segmentation, YOLOv4 for vehicle detection, and DeepSORT for multi-vehicle tracking. Overall, the paper is interesting with detailed comments as follows.

The standards are mostly met. One question:
The significance of segmentation in image preprocessing is not sufficiently explained. A clearer explanation is needed to illustrate why segmentation is a critical step in the proposed system.

Experimental design

The approach is validated on two datasets (UAVDT and KIT-AIS) and demonstrates high precision in vehicle detection.

The standards are mostly met. One question:
The two datasets seem not big. What is the scalability for large-scale traffic scenarios of your proposed pipeline.

Validity of the findings

Key innovations include improved segmentation accuracy via FCM, enhanced vehicle tracking with ID assignment using Speed-Up Robust Features (SURF), and precise trajectory mapping.

The standards are mostly met.

Additional comments

1. The significance of segmentation in image preprocessing is not sufficiently explained. A clearer explanation is needed to illustrate why segmentation is a critical step in the proposed system.
2. The two datasets seem not big. What is the scalability for large-scale traffic scenarios of your proposed pipeline.
3. In Figure 7, why UAVDT and KIT-AIS dataset seem totally same. The detections results are also identical.
4. Can a UAV remain stationary at a single intersection for a period of time? Estimating traffic flow or conditions often requires observations over a defined time window to capture dynamic changes accurately

·

Basic reporting

The manuscript is well-written, employing professional and clear English. However, certain phrases could benefit from rephrasing to enhance clarity and precision. For instance, the sentence "One of the significant is the extraction of foreground (vehicle) from complicated traffic scenarios" should be rephrased as "One significant challenge is the accurate extraction of vehicle foregrounds in complex traffic scenarios." Additionally, ensuring consistent terminology throughout the text would improve its readability; for example, the term "DeepSort" should be used uniformly instead of alternating with "DeepSORT."
The introduction provides sufficient context for the study and clearly outlines the motivation behind the research. The authors successfully highlight the significance of monitoring traffic using aerial images and the challenges associated with current methods. Nevertheless, the introduction could be strengthened by elaborating on the impact of the chosen segmentation techniques and offering a more comprehensive comparison with alternative methods. Furthermore, the discussion of related work should be expanded to emphasize how the proposed model innovates upon or surpasses existing approaches in terms of accuracy, scalability, or applicability.
The manuscript adheres to PeerJ's structural standards and is organized logically, with distinct sections for the introduction, methodology, results, and discussion. While the overall structure facilitates understanding, certain sections, particularly "Materials & Methods," would benefit from additional detail to enhance clarity. The authors should elaborate on pre-processing steps, such as specifying the types of noise removed and the criteria for selecting the gamma correction value. Similarly, the implementation details of key algorithms, including Random Forest, Fuzzy C-Means (FCM), YOLOv4, DeepSort, and SURF, should be described more comprehensively, focusing on their configurations and their specific roles in the system. Additionally, the process of ID assignment and recovery using SURF features should be clarified, particularly the criteria used to determine successful ID recovery.
The results section is presented clearly, but it lacks sufficient accessibility and interpretability for readers. Including visual examples that illustrate the outcomes of segmentation, detection, tracking, and trajectory estimation would significantly enhance the presentation. The tables summarizing performance metrics are helpful but require further explanation. Each metric, such as precision, recall, F1-score, and quality, should be briefly defined, and its relevance to the system's evaluation should be explicitly stated. This additional context would aid readers in understanding the significance of the reported values.
Formal definitions and rigorous explanations of key terms and equations are currently insufficient. Concepts such as gamma correction, feature importance in Random Forest, the FCM objective function, and the system's evaluation metrics should be clearly defined. Detailed justifications or derivations of the equations used in the manuscript would improve transparency and reproducibility. For more complex equations, illustrative numerical examples could further aid comprehension. It is also important to ensure that mathematical notation is consistent throughout the manuscript, as inconsistencies can lead to confusion.
While the limitations of the proposed model are mentioned, the discussion could be expanded to provide a deeper analysis. For example, the manuscript acknowledges that errors in earlier stages of the pipeline can propagate, but it does not propose potential strategies for mitigating these issues. Similarly, the challenges of scaling the model to handle more complex traffic scenes should be discussed in greater detail, along with possible optimizations for improving computational efficiency. A more thorough exploration of these aspects would strengthen the manuscript and provide valuable insights for future work.

Experimental design

The experimental design aligns well with the journal's aims and scope and addresses a relevant problem in the field of aerial vehicle surveillance. The methods used are technically robust and described in sufficient detail to provide a foundation for replication. However, several areas require improvement to ensure the transparency, reproducibility, and overall rigor of the study.
The manuscript provides an overview of the computational infrastructure, mentioning that experiments were conducted on a Windows-based system with an Intel i5 processor and 8GB of RAM. However, critical details such as the use of a GPU and the versions of software frameworks, libraries, and tools employed (e.g., TensorFlow, PyTorch) are missing. Including these details would significantly enhance the reproducibility of the experiments. Furthermore, while the authors mention the datasets used (UAVDT and KIT-AIS), there is insufficient detail regarding their preparation for training and evaluation. It is essential to describe how the datasets were split into training, validation, and test sets, and to clarify whether any data augmentation techniques were applied.
The justification for selecting the methods, particularly FCM for segmentation, YOLOv4 for detection, and DeepSort for tracking, is outlined but not sufficiently robust. The manuscript would benefit from a more in-depth discussion comparing these choices with alternative methods, considering their strengths and limitations in the specific context of aerial vehicle surveillance. For instance, while YOLOv4 is known for its efficiency and accuracy, the authors should discuss why it was preferred over other object detection models like EfficientDet or SSD, especially regarding their performance in complex traffic scenarios. Similarly, the advantages of FCM over Random Forest segmentation should be further substantiated with quantitative or qualitative comparisons.
The discussion of data preprocessing is adequate but could be expanded to include the rationale behind key parameter choices. For example, the gamma correction applied during preprocessing is mentioned, but the criteria for selecting the gamma value and its impact on subsequent segmentation and detection performance are not explained. Detailing such decisions would provide valuable insights into the methodology.
Evaluation methods are briefly described, but the process for obtaining the performance metrics (precision, recall, F1-score, quality, etc.) lacks clarity. It is unclear whether these metrics are calculated at the level of individual images, entire video sequences, or some other granularity. This ambiguity complicates the interpretation of results and their comparison to prior work. A clear and precise explanation of the evaluation protocol is necessary, including a description of how metrics are aggregated and whether any statistical significance testing was performed.
The manuscript references a broad spectrum of relevant works, ensuring appropriate citation of prior research. However, it could further emphasize how the proposed approach advances the state of the art. Highlighting the specific contributions of the manuscript, such as improved accuracy or efficiency compared to existing methods, would better establish its novelty and impact.

Validity of the findings

The findings presented in the manuscript suggest promising improvements in precision and speed compared to existing methods. However, there are several areas where the validity and impact of the results could be further strengthened through deeper analysis and clearer presentation.
While the authors include some comparisons with state-of-the-art methods in Tables 6 and 7, these comparisons are limited in scope. The manuscript does not provide sufficient detail regarding the experimental designs, model configurations, or dataset characteristics used in the referenced studies. A more thorough analysis of how the proposed system differs from and improves upon these methods is necessary. This should include an in-depth discussion of the advantages and limitations of the proposed approach, especially in the context of specific challenges in aerial vehicle surveillance.
The interpretation of the results is presented superficially, and the manuscript does not adequately analyze how each stage of the system—segmentation, detection, and tracking—contributes to the overall performance. For instance, it would be valuable to understand how errors in segmentation affect subsequent detection and tracking stages and how these errors propagate through the system. Furthermore, while performance metrics such as precision, recall, and F1-score are reported, their implications are not discussed in depth. Including a breakdown of the results by stage and a more detailed interpretation of the metrics would provide greater insight into the system's strengths and weaknesses.
A sensitivity analysis is notably absent from the manuscript. Evaluating how variations in key parameters, such as the gamma value used in preprocessing or the similarity threshold in the ID recovery process, impact system performance would provide important insights into its robustness. Sensitivity analysis is particularly important for assessing how well the system might generalize to scenarios beyond the tested datasets, such as more complex urban environments.
The conclusions are consistent with the results presented but remain too general. While the authors assert that their model outperforms others, they do not specify which methods or under what conditions this improvement is observed. The conclusions would benefit from a more detailed focus on the study's specific findings, practical implications, and potential applications in real-world surveillance scenarios. Additionally, a more comprehensive discussion of future directions, such as the integration of advanced deep learning techniques or optimization strategies to reduce computational demands, would enhance the manuscript's value.

Additional comments

In addition to the observations made in the primary review dimensions, there are general comments aimed at improving the overall clarity, readability, and presentation quality of the manuscript.
The visual presentation of figures could be significantly improved. For example, Figure 8, which illustrates ID assignment and recovery, is somewhat confusing in its current form. Enhancing the visual quality of the figures and ensuring they are accompanied by clear labels and informative legends would make the data more accessible to readers. Additionally, it is essential to ensure consistency in the design and formatting of all figures to maintain a professional presentation throughout the manuscript.
The quality of the writing is adequate overall, but there are areas where improvements in grammar, punctuation, and style would enhance the flow and clarity of the text. Ensuring consistency in the use of technical terms, as noted in previous comments, is crucial. A thorough editorial review is recommended to refine the text and improve its readability for an international audience.
The "Results" section, while comprehensive, could benefit from a more structured presentation. Dividing this section into clear subsections that align with the stages of the proposed system (segmentation, detection, and tracking) would improve the organization and facilitate the reader's understanding of the findings. This restructuring would also help emphasize the contributions of each system component and their respective impacts on overall performance.
The references section is extensive and covers a wide range of relevant works; however, some sources appear outdated. Updating the references to include more recent studies reflecting the current state of the art in aerial vehicle surveillance would enhance the manuscript's relevance and strengthen its position within the field. This update would also ensure that the paper appropriately contextualizes its contributions within recent advancements.
Finally, while computational infrastructure has been mentioned, additional details about the software environment, including the specific versions of libraries and frameworks used, would support better reproducibility of the experiments. The formatting of equations also requires attention; all symbols should be clearly defined upon first use, and a uniform presentation style should be adopted to avoid confusion.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.