Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on April 3rd, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on June 17th, 2025.
  • The first revision was submitted on August 11th, 2025 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on September 25th, 2025.

Version 0.2 (accepted)

· · Academic Editor

Accept

Thank you for your valuable contribution!

[# PeerJ Staff Note - this decision was reviewed and approved by Xiangjie Kong, a PeerJ Section Editor covering this Section #]

·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Additional comments

The author has solved my problem, and I suggest that this article be accepted.

Version 0.1 (original submission)

· · Academic Editor

Major Revisions

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

**Language Note:** PeerJ staff have identified that the English language needs to be improved. When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff

·

Basic reporting

-

Experimental design

-

Validity of the findings

-

Additional comments

This work presents a monocular depth estimation framework enhanced with several innovations to improve accuracy and robustness. The research topic is interesting, and some problems need to be considered.

1) In the present research situation, some examples combined with engineering need to be supplemented to improve the practical explanation.

2) Some work on 3D CNN and the latest work of learning-based methods can be considered for comments. Such as, "Edge-Assisted Epipolar Transformer for Industrial Scene Reconstruction, doi: 10.1109/TASE.2023.3330704. Robust Depth Estimation Based on Parallax Attention for Aerial Scene Perception, in IEEE Transactions on Industrial Informatics, doi: 10.1109/TII.2024.3392270. Neural Rendering and Flow-assisted Unsupervised Multi-view Stereo for Real-time Monocular Tracking and Scene Perception, doi: 10.1109/TASE.2025.3546713.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.

3) How real-time is the proposed self-supervised monocular depth estimation framework?

Reviewer 2 ·

Basic reporting

This paper presents a framework for unsupervised monocular metric depth estimation with an integration of attention mechanisms, a vector velocity photometric loss, and an optical flow module that enhances depth estimation accuracy. The motivation and model architecture are reasonable to me. The biggest problem I want to refer to is the overall paper writing and structure. The layout, writing logic, and structure are very obscure and cluttered. Here are some main points:

1. Too many redundant illustrations and tables.
For example, L213-L239, L315-L357, L358-L392, L419-434, L495-505, etc., are too long and redundant; many of them should be placed in the “preliminary” section or just considered as known information. You should focus on your novelty (which includes HCE, speed supervision) instead of much basic information.
For Tables 1,2,3, 4, Figure 2,5, there's no need to include them in the main text, shorten them and place them in “preliminary” or appendix. You should have one important figure similar to Figure 1 and place it in the method section. Figure 1(b) with more details is also necessary because this is the novelty you proposed. A good sequence of figures would be 1. A figure of motivation or examples results of your method 2. A figure for the overall framework 3. Figure for detailed parts ……

2. Specific equations and figures
Equation writing is too long, no need to change lines for legends, too many redundant writing especially for loss function, they should be accurate and concise, which means Table 2 and 3 are too complicate, you should assume readers and reviewers have the basic knowledges of the related topic and writing the most important steps.
Figures should be self-explanatory, which means readers can understand the figure directly. Figure captions are not self-explanatory. Here’s an example improvement: Table 5. Depth evaluation on KITTI Datasets. The best results are in bold.
Also, some captions are confusing, such as Figures 8, 9. Figures are not clear; use PDF for figures to obtain vector graphical images.

3. Layout
Since the UMAD datasets are also a contribution, you’d better include them in the method section instead of the experiment section.

The main impression for me of this paper is too redundant and unorganized. I really recommend you read some depth estimation papers and imitate their writing structure (For example, Af-SfMLearner[1]).

[1] Shao, Shuwei, et al. "Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue." Medical image analysis 77 (2022): 102338.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.

Experimental design

Most experimental designs are fine, but some problems have been pointed out above. The baselines compared in the article are out of date; it is suggested to include some recent baselines in 2023 or 2024 (e.g., Lite-mono ).

It would also be better to have experiments on the memory, inference speed comparison of the proposed model.

Validity of the findings

The first two novelties are good. Optical flow and bidirectional warping are common in self-supervised depth estimation, so I do not think this should be a novelty.

Cite this review as

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.