All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The paper can be accepted.
[# PeerJ Staff Note - this decision was reviewed and approved by Claudio Ardagna, a PeerJ Section Editor covering this Section #]
The manuscript succeeds in basic reporting.
The research question is well defined, relevant, and meaningful.
The knowledge gap is identified and I could imagine that the proposed research fills this gap.
The experimental design is reasonable.
Methods are described with sufficient detail.
Replication is almost complete. I haven't found how to get the synthetic datasets, but don't find this access crucial.
The findings are demonstrated through a limited computational experiment, but within this controlled experiment, the findings seems valid.
The manuscript and supplemental code provide enough foundation to replicate and continue this research.
The authors revised their manuscript and extended on their rebuttal.
With the extensions to the manuscript and all the arguments, I agree that this manuscript contains sufficient progress and rigor to warrant publication.
I especially like that there seems to be an agreement on future work.
And I also see that a placement of the proposed Saturn coefficient within the DimRed taxonomies might be target for future work as well.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
The manuscript succeeds in basic reporting.
The research question is well defined, relevant, and meaningful.
The knowledge gap is identified, and I could imagine that the proposed research might fill this gap.
The experimental design needs revision and clarification.
Methods are described with sufficient detail.
Replication is almost complete. I haven't found how to get the synthetic datasets, but I don't find this access crucial.
More details are discussed in the additional comments.
Unfortunately, the validity of the findings is questionable. As there are errors and shortcomings in the experimental design, the current findings are not well-founded.
The authors revised their manuscript, and I want to extend on my previous review.
Regarding the former weaknesses I identified, the authors improved their manuscript as follows:
* W1: How Does the Saturn Coefficient Capture Both Global and Local Structures Simultaneously?
The authors argue that using point distances as input to the Adjusted RV coefficient results in a metric that captures both global and local structure simultaneously.
While this working principle was the premise of my initial statement for this weakness, I think there might be a misunderstanding about the semantics of "global" and "local" structure preservation or corresponding quality metrics.
Coming from the fields of dimensionality reduction and information visualization, I refer to the classes of "global" and "local" structure as follows [R4]:
- "local modeling methods rely on neighborhood information of each data instance"
- "global measures, where the quality of a mapping is assessed by pairwise comparing (dis)similarity"
Following from these usages, I consider the Saturn coefficient a global metric unless the authors demonstrate that their approach considers the neighborhood as well (read: sets of neighbors disregarding their actual distances).
Another way of approaching this distinction might be to think of local techniques to assess a (small) subset of data points and global techniques to assess all the data points.
As the authors propose a metric with a working principle to use all pairwise distances, I see further weaknesses in their evaluation setup with their choice of metrics and target dimensionality reduction.
* W2: Only a Few Quality Metrics are Considered in the Comparison.
The authors have not extended to the strictly limited number of metrics they compare their proposed Saturn coefficient.
I see that the authors argue that they deliberately restricted the evaluation to UMAP, Continuity, and Trustworthiness.
I see that this is strictly limiting the extent of the contribution, as the Saturn coefficient seems to be an effective metric for a whole lot of other dimensionality reduction techniques.
Further, this still implies that the choice of the HDBSCANess needs to be well-founded. The rationale presented by the authors is not convincing.
An extension of the metrics considered would provide broader insights into how and when the Saturn coefficient might be more beneficial to use, and would alleviate the arguments on different classes of analysis approaches (local vs. global). In the end, it might be interesting how a global metric competes against local metrics and local dimensionality reduction techniques, but I deem it more essential to have a comparison against its immediate competitor metrics, other global metrics.
As of now, this field has a large number of dimensionality reduction techniques and projection quality metrics, which is also argued by the authors.
Recent advances acknowledge that for a multi-objective optimization problem, such as dimensionality reduction and its downstream applications, it's an unsolved problem to capture the projection quality in a scalar value.
Here, I direct the authors to recent approaches to help data engineers and visualization engineers approach this large set of metrics [R15] and to be aware that metrics are a proxy to approximate underlying characteristics [R17].
In the end, I further argue that a restriction on the local metrics Continuity and Trustworthiness (and HDBSCANess), we don't get to see a good-enough image of the characteristics of the Saturn coefficient.
* W3: Low Continuity and Trustworthiness Measures are not discussed.
The authors clarify their use of the hyperparameter k for the local neighborhood of their applications of UMAP, which is important information for replicability in general.
I, however, asked for their used (hyper-) parameters for the competitor metrics Trustworthiness and Continuity, as they're usually set to be low.
As the authors have not provided their used numbers, I had a look at the code the authors used for their evaluation https://github.com/davidechicco/SaturnCoefficient_R_package/blob/main/SaturnCoefficient/R/Saturn_coefficient.r
The following line is the one that computes both metrics:
ContTrustMeasureOutput <- as.data.frame(ProjectionBasedClustering::ContTrustMeasure(original_matrix, umap_output_layout, ncol(original_matrix)))
There, the third argument ("ncol(original_matrix)") refers to the (hyper-) parameter I wanted to have reporting on.
Unfortunately, I need to assume that this value is not low, but the actual number of data points in the dataset, which would imply that the authors are not using these metrics as their best practices.
As a result, the authors do not compare their proposed Saturn coefficient against meaningful uses of Trustworthiness and Continuity, reducing their evaluation to a comparison against HDBSCANess, which is also an underexplored metric with respect to the literature.
* W4: Novelty of Metric is not Sufficiently Argued.
The authors clarified that their Saturn coefficient inhibits merit and is of quality, despite being simple in its design and implementation.
I don't argue against any merit of the proposed metric or its quality (cf. S1, S4, S5). I further think the authors identified a useful application of (Adjusted) RV in the area of projection quality metrics.
What I see is that we disagree on the threshold of novelty to warrant a given name.
As I identified, there are general approaches to analyzing the pairwise distances using matrix correlation measures.
The authors just happen to use Adjusted RV, which might be an actual improvement.
Unfortunately, the evaluation provided with the manuscript contributes no evidence that the Saturn coefficient might be an improvement over any other metric (cf. W2, W3, W10).
* W5: Choice of Adjusted RV is only Weakly Motivated.
The authors direct towards previous studies to verify their choice for the Adjusted RV.
They extended the manuscript such that it is more self-contained in this regard.
However, as the authors do use different datasets than the referenced previous studies, I only consider this to be a weak motivation for the Adjusted RV. I do think the Adjusted RV might be the best choice. I also highlight that there are more rigorous approaches that provide a stronger rationale. For example, state-of-the-art approaches include verifying formerly documented assumptions on newly introduced data and continuing from there.
* W6: Some Implementation Details Missing.
The authors fixed this issue.
* W7: The Used Distance Measures are Underexplored.
This weakness was not addressed in the revision.
* W8: Do the Authors Propose to Use HDBSCANess?
The authors clarify that the goal is not to optimize for closeness to HDBSCANess.
However, this introduces further ambiguity as to why this measure is used as the lone measure to validate the proposed metric.
As a side note: Upon validating whether HDBSCAN and/or the Adjusted Rand Index might be used as a projection quality metric in the literature, I mainly identified uses for different use cases, but one: [R18].
* W9: Unavailability or Ambiguity of HDBSCAN Clusters Resulted in Exclusion of a Dataset.
The authors explained their rationale to exclude the datasets, and their argument is reasonable.
This results in a strict, controlled study where all measures make sense and can be compared.
However, this still leaves much for future work and coincides with my former remark that the choice of HDBSCANess as the lone measure does not represent a rigorous comparison.
* W10: There is no Discussion on the Generalizability of their Findings
The authors haven't revised their manuscript regarding this weakness, and from their responses, I presume there is a misunderstanding.
Hopefully, I can dispel this as follows.
The authors effectively argue that their Saturn coefficient is generally applicable to assess the quality of a dimensionality reduction in one aspect. This is also acknowledged in my former review ("The authors could discuss further applications of the Saturn coefficient for general dimensionality reduction techniques").
However, I wanted to point out that the manuscript has no statistical evaluation and presentation on the generalizability of their results, e.g., that the Saturn coefficient is a suitable metric to assess projection quality as opposed to Trustworthiness or Continuity.
This would usually include establishing a hypothesis, a corresponding 0-hypothis, selection and discussion of a significance test, and reporting of a p-value and confidence intervals. There might be other approaches to this, but the current manuscript is lacking a reliability measure to quantify the "advantages of our proposed Saturn coefficient".
Further, each design and conduct of an experiment has implicit or explicit, and known or unknown biases and errors that might pose a threat to the validity of their findings.
As an example, the chosen number of neighbors for Trustworthiness and Continuity poses a threat to validity, as a better chosen number would change the results. In this case, the number of neighbors was chosen poorly, such that it actually invalidates the results of the evaluation.
# New Minor Issues
With their revision, the authors introduced the following minor issues:
- Typo: "clusteirng"
# Conclusions
Summarizing, the changes made to the manuscript do not resolve a majority of the identified weaknesses. Conversely, the clarification concerning the number of neighbors for Trustworthiness and Continuity even invalidated the evaluation of the Saturn coefficient.
In conclusion, I still cannot recommend accepting this manuscript for publication.
I hope the authors further consider the identified weaknesses and continue improving their manuscript.
I'd really like to see the set of metrics for projection qualities expanding, as there is still a lot to learn to achieve better projections of our data to work with [R16].
# Additional References
* [R15] Bae et al.: "Metric Design != Metric Behavior: Improving Metric Selection for the Unbiased Evaluation of Dimensionality Reduction". To Appear in Proc. IEEE VIS 2025. arXiv preprint doi:10.48550/arXiv.2507.02225
* [R16] Jeon et al.: "Stop Misusing t-SNE and UMAP for Visual Analytics", arXiv preprint doi:10.48550/arXiv.2506.08725
* [R17] Machado et al.: "Necessary but not Sufficient: Limitations of Projection Quality Metrics". Computer Graphics Forum 44(3). doi:10.1111/cgf.70101
* [R18] Jin et al.: "Unraveling Scientific Evolutionary Paths: An Embedding-Based Topic Analysis". IEEE Transactions on Engineering Management 71. doi:10.1109/TEM.2023.3312923
Minor typos on line 357 of the revised manuscript, otherwise no comment beyond initial review.
-
-
I'd like to thank the authors for taking the time to address my comments as well as the comments of the other reviewers. I appreciate the added discussion to the manuscript and feel that they, for the most part, address my comments/questions regarding the interpretation and use of the score in more complex data scenarios. I would have really liked to see some additional experimentation, especially around hyperparameter tuning, but I believe the authors have made a strong effort towards explaining the method and limitations, as well as clarifying the interpretation and use cases.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
The language of the manuscript is acceptable, but could use further proofreading passes.
The Background is not sufficiently provided and discussed.
The article structure is good, but the tables and figures are sometimes hard to handle when they're scattered across pages.
More details are discussed in the additional comments.
The research question is well defined, relevant, and meaningful.
The knowledge gap is identified, and I could imagine that the proposed research might fill this gap.
The experimental design needs revision and clarification.
Methods are mostly described with sufficient detail, but still could use further details.
More details are discussed in the additional comments.
-
The authors propose a metric to measure the quality of the UMAP dimensionality reduction technique.
Their metric - the Saturn coefficient - captures the correlation in distances in the higher-dimensional space and the lower-dimensional space.
The authors evaluate their metric using a benchmark on synthetic and real-world datasets in comparison to three further quality metrics: Trustworthiness, Continuity, and HDBSCANess.
# Strengths
The approach of the authors has some strengths I want to highlight.
* S1: The Saturn Coefficient is Easily Described and Implemented.
* S2: Evaluation Using Both Synthetic and Real-World Datasets.
* S3: The Use of HDBSCANess as Additional Quality Metric.
* S4: Saturn Coefficient Indicates Higher Values for Seemingly Good UMAP Layouts than the Comparison Metrics.
Despite often low values for Trustworthiness and Continuity, the proposed Saturn Coefficient provides higher measures for visually acceptable UMAP layouts.
* S5: Source Code is Available Open Source, and Packages are Available on CRAN and PyPI.
# Weaknesses
However, the implementation and manuscript also have several weaknesses that should be addressed to greatly improve clarity, soundness, and rigor.
* W1: How Does the Saturn Coefficient Capture Both Global and Local Structures Simultaneously?
How does the Saturn coefficient, taking into account all distances among all points, and thus corresponding to the class of global measures, also capture local structure?
Why do the authors propose to use a global measure (all distances among all points) to assess the quality of UMAP, a dimensionality reduction technique that optimizes for local neighborhood?
As of its approach, the goal of UMAP is not retention of distances, but retention of neighborhood.
* W2: Only a Few Quality Metrics Considered in the Comparison.
Quality metrics are a thorough field in Visualization, and for scatterplots based on dimensionality reductions, there are several works out there that survey them, apply them, and compare them.
I direct the authors to the works of Nonato & Aupetit and Lee & Verleysen for a starting point [R4,R7].
I further direct the authors to further studies by Espadoto et al. [R2] and Atzberger et al. [R3], where both included an evaluation using UMAP and using further metrics to assess projection quality.
Even further studies may provide more foundation for the evaluation of the authors [R8,R9,R10,R11,R12,R13,R14].
* W3: Low Continuity and Trustworthiness Measures are not Discussed.
While having a look at the low values for Continuity and Trustworthiness, I found no discussion by the authors on why the values can be expected to be low (e.g., given an intuition using the definition).
Moreover, I found that the authors have not documented the used value(s) for the shared hyperparameter of these two metrics: k, as the number of neighbors considered as local neighborhood.
In the literature, this value is usually chosen to be low, e.g., 5 or 7, but from one argument of the authors - "involves [...] all the data points" - I'm wondering if this number was chosen quite high.
* W4: Novelty of Metric is not Suffiently Argued.
The authors present their Saturn coefficient to be easily constructed and computed, and indeed, the main analysis is the use of AdjustedRV, which is a state-of-the-art technique.
Moreover, the authors only need five lines of code to implement their metric, while combining only basic R packages (base, stats, MatrixCorrelation) using no additional logic (cyclomatic complexity of 1).
The authors should clarify how this justifies coining this approach the "Saturn coefficient" and present this as a novel metric.
But coming from a neighboring field of study - quality metrics of dimensionality reductions -, consideration of pairwise distances in high-D and low-D is also explored and evaluated.
For example, the general approach to correlate distances in high-D and low-D is also present in the Spearman Rank Correlation [R3] that is used together with a Shepard Diagram [R5,R7].
The authors should discuss this as related work and ideally extend their evaluation accordingly.
I have the impression that the used RV/AdjustedRV might be a generalization of the Pearson Correlation and the Spearman Rank Correlation when applied to the Shepard Diagram, but I suppose this argument is best constructed and validated by the authors.
* W5: Choice of AdjustedRV is only Weakly Motivated
The authors use an argument by observation (with little documented evidence) instead of an argument by characteristics/structure of the different RV variants.
I'd like to see a more extensive argument on the choice of AdjustedRV, ideally following tests and corresponding reporting using the other RV variants.
* W6: Some Implementation Details Missing
In formula (1), the used scaling is missing information on whether there is a uniform scaling or a per-dimension scaling. It is just stated that "all the columns of X only contain values between -1 and +1", which would be true in both cases. There might be further variants of scaling, adding to the ambiguity.
* W7: The Used Distance Measures are Underexplored.
While Euclidean distances might be sensible for lower-dimensional datasets (e.g., the synthetic data used in the experiments), higher-dimensional datasets are more often analyzed using different semantics of distance, e.g., cosine similarity. This holds for both HDBSCAN [W8] and UMAP [W9].
As the authors also acknowledge their use, I'd expect more discussion on their exclusion from the tests.
* W8: Do the Authors Propose to Use HDBSCANess?
While the authors optimize for closeness to HDBSCANess, I miss further details on why this metric should be optimized towards.
I want to assume that this metric does not represent the actual goal, because otherwise, we could just use HDBSCANess.
As far as I can tell from the manuscript, the proposed Saturn coefficient would alleviate the need for hyperparameter optimization of HDBSCAN.
Another open question is whether cluster retention of a high-dimensional dataset is an actual target.
This argument leads towards the current, broad landscape of dimensionality reduction techniques, distance and similarity measures, and quality metrics.
Further, a high HDBSCANess, which means a large retention of node-cluster relationships after dimensionality reduction - in combination with UMAP, a nonlinear dimensionality reduction - indicates that HDBSCAN itself is a neighborhood-assessing clustering algorithm [W10].
Why should a global quality metric, such as the proposed Saturn coefficient, correlate with a neighborhood-assessing clustering-algorithm-based metric?
Strangely enough - and as argued before - the neighborhood-assessing metrics Trustworthiness and Continuity, do not seem to correlate with HDSCANess... This is an interesting starting point for more extensive evaluation.
* W9: Unavailability or Ambiguity of HDBSCAN Clusters Resulted in Exclusion of a Dataset.
Unfortunately, all datasets where HDBSCANess could not be computed or where there were differences in the number of clusters were excluded from further analysis.
As I read from the manuscript, the actual quality metrics, Trustworthiness, Continuity, and the proposed Saturn coefficient could be computed without error.
Can they be reported either way?
Could the authors think about expanding the list of target metrics, such that there are multiple / even more targets to discuss?
* W10: There is no Discussion on the Generalizability of their Findings
For one, the authors don't report on statistical significance.
Further, the authors don't discuss threats to validity.
# Minor Issues
Lastly, some minor issues should be easily resolvable.
* M1: How is the Proposed Saturn Coefficient Specific to UMAP?
The authors could discuss further applications of the Saturn coefficient for general dimensionality reduction techniques.
Moreover, I could think of applying such a metric to measure the sensitivity or stability of dimensionality reduction techniques [R3].
* M2: Argument to Dismiss the Studies [15] and [16] is Insufficient.
How is the argument "were not intended for UMAP" applicable when both approaches were published over 8 years before UMAP?
What if they were applied either way, assessed as well, and the results reported alongside other metrics?
I think this would add to the impact of the manuscript.
* M3: Introductory Examples for 1D, 2D, and 3D Visualizations is Ambiguous and Misleading.
Why not use scatter plots throughout, maybe also including color?
I mainly stumbled upon the use of heatmaps, as they drift off the scatter plot direction towards quantized/binned visualizations.
* M4: Why are there two names for the proposed approach: Saturn Coefficient and Saturn Score?
Just from the semantics of coefficient and score, I'd lean towards score.
* M5: It is unclear how the Saturn coefficient is "related to the Venus score".
The Venus score is a set of ten questions to increase the quality of datasets, which is not trivially similar to matrix correlations as proposed with the Saturn coefficient.
* M6: A Recent Survey was Not Discussed.
I direct the authors to the survey by Hyeon Jeon et al., recently published at CHI'25 [R1], which could provide a further overview and motivation for the overarching topic of dimensionality reduction.
* M7: Needed Revision of Figure 1.
For one, this figure is excessively large.
As another issue, the annotations (a), (b), ... are not used in the figure caption or throughout the manuscript.
* M8: Inconsistency in Report on Synthetic Datasets.
In the introduction of section 3, the authors write about "eight artificial datasets".
Later, e.g., in subsection 3.1, the number of datasets is actually "four".
This is further reaffirmed in Figures 2 and 3 and Table 1.
Further, there seems to be a mismatch in the naming of the cameleon dataset, which is sometimes (t) and sometimes (c) (cf. Table 1, Figure 4).
* M9: Cluster Colors for Original Data Missing.
For Figures 2 and 3, it would be nice to see the cluster coloring as well. For example, this would allow us to get an idea of which part of (t1) morphed to which in (t2) when applying UMAP.
* M10: Phrasing Issues:
- "openly available" -> "freely available"
- mixed "et al.", "and colleagues", and "and collaborators"
* M11: Typos:
- "dimennsionality" -> "dimensionality"
- "chameleon data Figure 3)" -> "chameleon data (Figure 3)"
* M12: Use of References as Objects in a Sentence
- "The article [14]" - The main style of the manuscript seems to avoid using references as subjects/objects in sentences
# Conclusions
Summarizing, I see some merit in the works of the authors, their prepared manuscript, and published software.
However, I also see a larger number of weaknesses and further minor points that should be addressed.
As of now, I cannot recommend accepting this manuscript for publication.
I hope to see the identified issues addressed and my questions answered in a revision of this manuscript.
# Further References
* [R1] Jeong et al., 2025: "Unveiling High-dimensional Backstage: A Survey for Reliable Visual Analytics with Dimensionality Reduction". Proc. ACM CHI '25, Article 394, 24 Pages. doi:10.1145/3706598.3713551
* [R2] Espadoto et al., 2021: "Toward a Quantitative Survey of Dimension Reduction Techniques". IEEE Transactions on Visualization and Computer Graphics 27(3):2153-2173. doi:10.1109/TVCG.2019.2944182
* [R3] Atzberger et al., 2025: "A Large-Scale Sensitivity Analysis on Latent Embeddings and Dimensionality Reductions for Text Spatializations". IEEE Transactions on Visualization and Computer Graphics 31(1):305-315. doi:10.1109/TVCG.2024.3456308
* [R4] Nonato & Aupetit, 2019: "Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment". IEEE Transactions on Visualization and Computer Graphics 25(8):2650-2673. doi:10.1109/TVCG.2018.2846735
* [R5] Cutura et al., 2020: "Comparing and Exploring High-Dimensional Data with Dimensionality Reduction Algorithms and Matrix Visualizations". Proc. ACM Advanced Visual Interfaces, Article 10, 9 Pages. doi:10.1145/3399715.3399875
* [R6] Lee & Verleysen, 2010: "Scale-independent quality criteria for dimensionality reduction". Pattern Recognition Letters 31(14):2248-2257. doi:10.1016/j.patrec.2010.04.013
* [R7] Joia et al., 2011: "Local Affine Multidimensional Projection". IEEE Transactions on Visualization and Computer Graphics 17(12):2563-2571. doi:10.1109/TVCG.2011.220
* [R8] van der Maaten et al.,2009: "Dimensionality reduction: a comparative review". Technical Report TiCC TR 2009–005, Tilburg centre for Creative Computing, Tilburg University. url: https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction_Review_2009.pdf
* [R9] Gisbrecht & Hammer 2015: "Data visualization by nonlinear dimensionality reduction". WIREs Data Mining and Knowledge Discovery 5(2):51-73. doi:10.1002/widm.1147
* [R10] Tian et al., 2021: "Quantitative and Qualitative Comparison of 2D and 3D Projection Techniques for High-Dimensional Data". MDPI Information 12(6), Article 239, 21 Pages. doi:10.3390/info12060239
* [R11] Gove et al.,2021: "New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation". Elsevier Visual Informatics 6(2):87-97. doi:10.1016/j.visinf.2022.04.003
* [R12] Castelein et al, 2023: "Based Quality for Analyzing and Exploring 3D Multidimensional Projections". Proc. SciTePress IVAPP '23, pp. 65-76. doi:10.5220/0011652800003417
* [R13] Tian et al., 2023: "Measuring and Interpreting the Quality of 3D Projections of High-Dimensional Data". VISIGRAPP 2023, pp. 348-373. doi:10.1007/978-3-031-66743-5_16
* [R14] Benato et al., 2023: "Measuring the quality of projections of high-dimensional labeled data". Elsevier Computers & Graphics 116:287-297. doi:10.1016/j.cag.2023.08.023
* [W8] https://scikit-learn.org/stable/modules/generated/sklearn.cluster.HDBSCAN.html, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html#sklearn.metrics.pairwise_distances (search term: "cosine")
* [W9] https://umap-learn.readthedocs.io/en/latest/parameters.html (search term: "cosine")
* [W10] https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html (search term: "minimal spanning tree")
**PeerJ Staff Note:** The PeerJ's policy is that any additional references suggested during peer review should only be included if the authors find them relevant and useful.
-
A separate section for experimental setup could be shown, with all the parameters used, which could help others to reproduce your work.
Comparison with existing techniques could be mentioned and also with what parameters your work is assessed with existing techniques, along with substantial proof showing your approach outperforms existing ones with detailed explanations
-
There are bugs. I sent a code.
The metric seems to be capturing global structure.
The current Python implementation of the Saturn coefficient fails, returns NaN values when tested on standard datasets like MNIST. This suggests a critical bug or lack of input validation. Proper error handling and documentation are urgently needed to ensure reproducibility. No github
- No issues with overall writing/reporting, the text is overall clear.
- It might be helpful to provide an exact definition of the adjusted RV formula.
- One point of clarity is that under the section of the introduction titled “our proposal: a trade-off between global and local structures preservation,” there isn’t much explanation of how the Saturn score is calculated, and the abstract also does not specifically mention this. The introductory materials do a good job of framing where Saturn fits in as a method, but it would be useful to add a couple of qualitative sentences in the abstract/intro briefly describing the method itself or how it is calculated to give readers a more concrete sense of what is being proposed.
- It would be helpful to have more explanation regarding comparing the HDBSCANess score and Saturn score via absolute difference, or for example, how might a user interpret a Saturn score of 0.7 versus a HDBSCANess score of 0.7. It's not clear to me that these two scores are on the same scale, or whether a Rand index score between cluster labels can be directly compared to an RV correlation between distance matrices in this manner. This is highlighted in the situation of Figure 2.s2, where the UMAP clusters are clearly well-defined, but the Saturn score of 0.175 diverges significantly from the HDBSCANess of 0.990.
- Expanding on the point of interpretation of the score, the proposal to use the Saturn score as an objective way to tune dimension reduction hyperparameters is a great application and potentially a very valuable tool where ground truth labels are unknown, as is usually the case. To give a better idea of what the Saturn score represents, it could be really informative to focus on one experiment, and then visualize the relationship between varying the UMAP parameters, seeing the projection, and seeing how the score changes with the projections. Again, I think this is a very valuable contribution of the method, and it would be great to see such a case study in practice.
- I agree with the authors that unsupervised learning is naturally used in settings where there are no ground-truth labels, and their HDBSCAN experimental setup does in some ways mirror typical data analysis scenarios, such as analysis of RNA data. However, for the purposes of establishing a new metric, I do think that it could be valuable to benchmark against datasets with known labels, even as simple as something like MNIST. Since the HDBSCANess score is essentially a clustering score, it seems to me that in the case of known data labels, it would be similar to also compare the Saturn score against something like a silhouette score or even a Rand index against the true labels. Would it be possible to compare the score in applications with ground truth labels?
- Just for the purposes of discussion, it could be useful for the paper to add some discussion on a couple of points. The first is that the Saturn score considers the pairwise distances in a strict manner when comparing against the projected representation. However, the UMAP objective primarily focuses on local relationships and does not directly optimize for preserving the same global cluster separations. This distinction may result in the Saturn score often being lower on UMAP projections and could be useful to discuss for interpreting the Saturn score for methods such as UMAP versus structure-preserving dimension reduction methods like classical MDS or PCA. The other point of discussion I think that would be useful is for situations where Euclidean pairwise distances may not be the best representation of the data, such as the spiral clusters presented in Figure 2, and whether the method could be extended in these cases.
Overall, the authors do a good job of concretely establishing an objective measure of a dimension reduction method's ability to preserve distances in an overall well-principled manner, and providing software packages to easily compute it. The use of Saturn as a tool for hyperparameter tuning is a valuable contribution, which I think the authors could highlight further by performing a case study like the one mentioned above, which would also really help interpret what the Saturn score represents in practice. It would also be great if the authors could add some discussion regarding the interpretation of the score, since there seems to be some differences between the structure of the Saturn score and the UMAP objective function.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.