Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on February 8th, 2021 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on April 20th, 2021.
  • The first revision was submitted on October 8th, 2021 and was reviewed by 1 reviewer and the Academic Editor.
  • The article was Accepted by the Academic Editor on November 1st, 2021.

Version 0.2 (accepted)

· Nov 1, 2021 · Academic Editor

Accept

Thank you for your thoughtful edits. The article is in a good shape and we congratulate you on its acceptance!

·

Basic reporting

In their revised manuscript, the Authors have satisfactorily addressed my main criticism and suggestions. The revised manuscript is clear and well-written, and its overall structure has improved. Many minor confusing points have been clarified.

Experimental design

The methodology proposed is scientifically sound.

Validity of the findings

The Authors have been very careful in delimiting the scope of validity of their method, its advantages and shortcomings.

Version 0.1 (original submission)

· Apr 20, 2021 · Academic Editor

Major Revisions

Please improve the problem statement, research question and rationale of the approach. As per reviewer 2, please also consider in detail the consequences of hypercube corrections in the case of data with distributions other than the uniformly sampled hypercube.

As reviewer 2 suggests, please consider making the code available, as it is best practice for methodological papers as yours.

Please also adjust the structure of the paper to conform to PeerJ guidelines, according to reviewer 1 as well.

Reviewer 1 ·

Basic reporting

The language is generally clear, but it can be improved in certain sections. The literature is well referenced. The structure is not conforming to PeerJ Computer Science standard (namely, the Methods section should be placed right after the Introduction, not after the Discussion). The figures are high quality, but adding more details in the corresponding captions would facilitate their comprehension.

Experimental design

The article tackles a critical problem in the computer science field - dimensionality estimation. The research question, however, should be better stated in the introduction. Specifically, authors should clearly state the rationale and the aim of their study. The methods should be moved after the introduction and before the results so as to facilitate the reader following the flow of the article.

Validity of the findings

The proposed approach solves certain limitations of the algorithm that is based on. The authors performed an extensive comparison of their approach with other state-of-the art algorithms on various synthetic datasets. Perhaps, an additional comparison of the proposed approach with such algorithms on neural data might be ideal. Conclusions are missing.

Additional comments

The paper sets to revisit a dimensionality estimation algorithm, the manifold adaptive Farahmand-Szepesva’ri-Audibert (or FSA). The authors first computed the local probability density function following the original FSA pipeline and then e the median of such pdf to obtain the global estimate of the dimensionality. They further corrected for finite sample effect implementing a correction formula and finally compared the performances of their algorithm with those of the original FSA as well as other state-of-the-art techniques. The proposed approach outperforms the original FSA and perform similarly to other methods when applied to synthetic datasets. When applied to neural signals recorded during epileptic seizures, the authors hypothesize that low-dimensional brain regions might be potential sources for the seizure onset.

The overall structure of the paper is ok to follow. However, certain parts of the manuscript can be improved and some details need to be added. Following are some specific comments.

1. Line 96. I suggest you provide more justification for your study. What is the rationale of your approach? How do you expect your result to differ from the original FSA algorithm? Also, could you be clearer when you say 'we correct the underestimation effect by an exponential formula'?
2. Line 98, end of introduction. Could you please add some description of your following section in a clearer way?
3. Authors should also compare their approach to the original FSA (or to some other methods) on the neural dataset and not the synthetic ones only.
4. The method section should be moved right after the introduction. This would allow to describe the proposed approach before showing the corresponding results. Moreover, some results are already described in this section (e.g., lines 252, 256, 274). Those sentences should be removed and included in the results section only. Also, please put the figures closer to the corresponding location in the main text where they are referred to.
5. Authors should add a conclusion section.
6. Line 167, cmFSA acronym should be defined before use.
7. Line 244, authors should justify why they chose to test those three specific values of k.
8. There are some inconsistencies related to the use of the notation for the true and the predicted dimensionality. According to line 255, D indicates the true dimensionality and d the predicted dimensionality. However, in line 270 you use d to indicate the true dimensionality and d ̂ for the predicted one. Choose one notation and stick with it.
9. The captions of each figure should be more detailed. Specifically, they should briefly describe the take-home message of the figure (one or two sentences are enough).
10. Figure 1: it is not clear to the reviewer how the histogram was obtained. Didn’t you test only one realization in this case (line 242)?
11. Table I: what are cmFSA_fr and M_DANCO_fr?

·

Basic reporting

The manuscript is sound and well written. References are generally exhaus-
tive. The methodology is clearly explained.

Experimental design

The propose method seems to be competitive with state-of-art methods, and I believe it
offers some advantages with respect to some of issued of ID estimators, in particular
boundary effects, and variations of the density of points in the data.
I think the manuscript is a fair contribution to the field of ID estimation, and it can be of interest to researchers in this area, and more in general to researchers needing accurate ID estimation as part of their data analysis pipelines.

Validity of the findings

Major comments:

1) In my opinion, the main problem with the boundary-effect correction is that it is o
ptimized for uniformly-sampled hypercubes, and may lead to overestimation of the ID in cases when the data are not uniforly sampled. This is clearly visible form table I: while the estimation is nearly perfect for uniformly sampled data on linear subspaces [M2,M9,M10a-c], or gener-
ally uniformly sampled data on locally at spaces [M5,M7,M13], it yields an overestimation in the case of non-uniformities, such as he Gaussian case [M12], the non-linear manifold case [M6], or the sphere [M1]. The overstimation may be even more sever for non-uniform samplings with heavy-tailed distributions, such as the Cauchy distribution used in Facco
et al. 2015. The authors should extensively comment on this point.

2) Since this is a methodological work, I would recommend that the authors
make publicly available the code implementing cmFSA.

3) It is not clear how the different sample sizes were included in the calibration of the correction term. It seems that the calibration term used to infer the ID of the datasets M1-M13 was inferred from the n = 2500 hypercubes. Is one going to use the same term with datasets of different n ? It seems that one should rather use a term calibrated on that specific n . The authors should comment on this point. Furthermore, why was
k = 5 used for calibration, instead of k = 1 used in subsequent analyses?


Minor comments in attached PDF.

Additional comments

Comments in attached PDF.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.