Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on April 23rd, 2021 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on June 6th, 2021.
  • The first revision was submitted on August 3rd, 2021 and was reviewed by 3 reviewers and the Academic Editor.
  • A further revision was submitted on September 28th, 2021 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on September 30th, 2021.

Version 0.3 (accepted)

· Sep 30, 2021 · Academic Editor

Accept

Thank you for incorporating the recommend changes. The revised version of the manuscript is now in good shape and can be published.

Version 0.2

· Aug 26, 2021 · Academic Editor

Minor Revisions

Based on reviewers’ comments, you may resubmit the revised manuscript for further consideration. Please consider the reviewers’ comments carefully and submit a list of responses to the comments along with the revised manuscript.

·

Basic reporting

The article is mostly the same compared to the last iteration.
Minor improvements have been made.

Regarding related work:
Given the new Example 3.2.2, the important related work of "2D Communication Views" should be discussed in the article.
2D communication matrices have been used in Vampir and other tools, e.g.
https://www.researchgate.net/profile/Michael-Wagner-45/publication/284440851/figure/fig7/AS:614094566600711@1523422962550/Vampir-communication-matrix-taken-from-GWT14_W640.jpg
https://apps.fz-juelich.de/jsc/linktest/html/images/linktest_report_matrix.png

As it stands the hypothesis that the view is effective for performance analysis is not sufficiently supported.

Some replies to modifications made:
I agree with most of the changes made.
> Regarding the "x = 0 plane", the reader doesn't need to worry too much about finding
I believe it then is not wise to mention this in Line 216. I spent quite some time trying to find the network information in the figure. Maybe the horizontal bar at the end, what can we see there in the first place? This description should be removed as in this figure we cannot see anything useful. It is discussed in Figure 7 as well where it makes sense.

Not using syntax highlighting in the code presentations in Figure 13 is not acceptable for me. In Figure 3 you could at least use bold font to increase readability a bit. The coloring makes sense but maybe the ifdefs then could be removed to spare space and well it doesn't add new information...

Experimental design

The experimental design still has its major weakness to answer the question: is the 3D visualization effective for analysis?

The example in Section 3.2.2 Industrial CFD Code is not convincing to show that the 3D visualization is superior to a 2D visualization.
I would claim that the issue pointed out here (too many comms and empty ones during the iterations) can be visualized using 2D communication views and there can even be better seen.
It has been claimed by authors of the same team those visualizations are effective to identify those issues.
You must include the 2D visualization as related work and show that your example is more effective -- differentiate your work from this analysis method that had been used effectively in the past. I'm convinced you know about these methods and wonder why they haven't been included and discussed here.
If it isn't a good example, then you need to find a better example or do a quantitative study.

Validity of the findings

The paper is a minor increment from the authors' last paper version.
Generally, I believe the 3D visualization has its merits, however, the paper is not yet convincing.

Additional comments

Given the unsatisfactory reply of the authors to the key issue, the minimal changes made in the article for resubmission, and the issues raised by reviewers regarding the contribution, now, I have more reservations about the article than before. Is there no reasonable example that shows the benefit of 3D visualization? If it is just a matter of time to create such an example, then bring it on - particularly as the last article of you included quite some aspects of this paper. The novelty of this article here would be to me to demonstrate the merits of 3D visualization.

At the moment, the contribution remains too weak for me to justify publication in the journal - it would be OK for an applied workshop. Please continue the interesting research and demonstrate to the readers the merit.

·

Basic reporting

After reading the diff file sent by authors, I have found no significant typos in the text. The issue in the text that I have previously identified has been fixed. Authors have improved the discussion and the differences against other works. Raw data, whenever possible (copyright issues), has been made available through the DOI 10.25532/OPARA-119. The issue of lines 97 and 98 has been fixed by citing ParaProf directly.

Experimental design

The experimental design (factors, response variables, replications) is very synthetic in the sense that only a few factors and response variables are involved. That being said, such simplistic experimental design seems sufficient to generate the study cases to evaluate the approach, so it's okay.

Validity of the findings

My first major concern (summary of my previous review) was "discussion of the differences against related work (other attempts that depict very similar views of this contribution's topology view)". Authors very briefly compare themselves against RW in Section 1 ("Related Work") arguing that the main drawback of existing solutions are that they have developed a 3D viewing tool on its own instead of using an existing generic visualization tool as authors of the submitted manuscript do. So, the argument is purely implementation/technical and not scientific. I rather prefered a discussion about how the 3D views of the current work differ from the 3D views that have been implemented in those previous work, from the perspective of a performance analyst. Just to give you an idea, a very simplistic 3D topology/hardware view has been already proposed in the Triva tool. What is the difference between those views and yours? The V1 version has show no more ellaborate discussion about this specific point. Se my first concern remains.

Authors have improved the results section to report the back and forth of the performance analysis/optimization with the real application. So I feel the result is more consistent now. Consequently, my second concern about usefulness is satisfied. In addition, raw data has been made available (DOI). It includes PDF files with the SLURM output (containing the measurements for the evaluation tables) and VTU files for the visualization.

Additional comments

I understand that many comments are a matter of style, I left to the editor to decide whether they are important or not for this journal.

Reviewer 3 ·

Basic reporting

The paper is easier to read after the revision because most of the explicit criticism regarding the writing has been addressed.

One of the issues I mentioned that has not been addressed is the following:
>>I found a *central* part (lines 65 to 79) of the introduction not to be well well understandable. What does the "flipping" mean in detail? What does "performance ... inside in situ" mean in contrast to "in situ inside performance"?<<

I did reread this part and still have problems to understand the actual meaning. So I believe other readers will have the same problem. The authors should try to reformulate this or provide additional information/description/explanation on its meaning and impact.

Further, change "for decades" to "a long time" does not eliminate the need for reference(s) to support this claim.

Regarding the color bar labels: Their format *can* be changed in ParaView: see 6.3 and 10.2.4 here: https://www.paraview.org/paraview-downloads/download.php?submit=Download&version=v5.8&type=data&os=Sources&downloadFile=ParaViewGuide-5.8.1.pdf Also the placement of the color bar can be influenced.

Experimental design

In this revision, the experiments have been extended and do show the usefulness of the method now.

Validity of the findings

No additional comments.

Additional comments

The paper has been significantly improved in this revision. This brings it close to be ready for publication. I recommend to accept the paper after a *minor* revision addressing my new comments.

Version 0.1 (original submission)

· Jun 6, 2021 · Academic Editor

Major Revisions

Considering the reviewers' recommendations, the article cannot be published in its current form. I hope you will revise the article to address all the raised concerns and submit an improved version soon. Best of luck!

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a response letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.  It is a common mistake to address reviewer questions in the response letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the response letter.  Directions on how to prepare a response letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

·

Basic reporting

The text is generally easily readable fulfilling the needs of the journal.
There are some presentation issues that should be improved though.

1) Abstract
It reads a bit awkward and should be rewritten particularly related to the relationship to the previous publications.
I wouldn't start with "This paper continues the work initiated" but rather with the motivation (2nd sentence).
Then I do not care which paper there was. I would say: In previous work, we developed XX and YY.
In the introduction, you can clarify what exactly you did compared to previous work (and you actually did so).

"also in codes without in situ" => In this article, the tool is extended allowing users to visualize performance data on a 3D representation of the hardware resources ... Visualization can be performed post-mortem or in situ using Catalyst...

2)
There are sections where various commas could be added after introductory clauses; they should be added for readability.

3)
The presentation of listings in Figures 3, 4, 12 should be improved.

4) Evaluation
There surely is plenty of information in Figures 6+. The description feels initially a bit rough and the organization should be improved.
The first three images are presented and then explained. I felt a bit left alone trying to understand various information from the first paragraph but then couldn't figure it out. I would reorganize this text a bit, first describe the concept, Line 216ff. Help readers to understand, guide them better through even the first image, then go to the next.
I would absolutely not start with Fig 6 as there is way too much info here.

5) There are often too many sections in (), you can try to shorten sentences and rather break them into two.

Experimental design

The experimental design validates the new presentation mode using standard benchmarks.
The design itself is reasonable given the nature of this article.
However, the presentation of it shall be improved (see notes).

Validity of the findings

The approach is sufficiently novel that it deserves publication. I can see that there are cases where this is useful.

However, to me, the article reads slightly a bit like a manual instead of a research paper. I would have been intrigued if a study would have been made that actually shows that this presentation is beneficial for users. I know that would have been not easy.
But as there can be X-papers that claim arbitrary new ways of presentation, the question is, is the given presentation useful. Personally, I find it potentially helpful but also quite cluttered with messages. I would be interested in qualitative analysis.

The conclusions are mostly well stated, the key issue is sometimes to assess "effectivity" of a presented result. The analysis of the figures is rather subjective and hence a bit speculative and not as clear as the article states. I would welcome more discussion about the quality of the approach. For instance, saying something like "It is effective to identify X", the question is why couldn't this be analyzed in existing tools? Maybe another (sub)section can be added that discusses the findings critically and puts them into the perspective of existing tools.

Additional comments

33. "high-tuned" => tuned
40. remove () around generated...
48. Figure 1 should include "coprocessor" (it isn't actually shown!)
75. "None of the existing performance analysis tools incorporates sophisticated output visualization." => You sure to leave the statement? I believe Vampir does have somewhat sophisticated vis...
81. (Hydra) => Hydra
99. (in the hyperlink) => But this is not linked; not good to have it hidden, remove this reference. I would prefer also a proper reference to ParaProf. Much better would be to use footnotes, a practice you use on page 4... Be consistent!
162. The selected functions.... => this sentence isn't clear, add one more. Clarify why "independently". Some readers are not aware of how this works...

The presentation of listings in Figure 3 and 12 should be improved.
Figuer 4. Remove extra whitespace.

181. "statistical oscillation of the run time too" => clarify the footnote.
211. "x = 0 plane" => I struggle to find x=0 in the figure.

Fig 6 caption => "with different [visualization] parameters"
How do they actually differ?

Fig 10 "and not (bottom)" => "or not (bottom)"

I find the order of Table 1 + 2 not intuitive, I would have started with baseline, then +Score-P, then +plugin.
Would get rid of an awkard naming "++plugin" in Line 277.
It isn't clear from the table that "running time" includes the relative overhead as well.
Again: if you start with baseline, you could just say overhead "12%", cause nobody needs the actual runtime for these anymore.

330. Use a footnote for the GitHub link.

·

Basic reporting

English is okay, I have found no significant typos in the text, except perhaps in Line 235 where the phrase could be clarified. The authors cite literature correctly although further discussion about the differences is missing (see my complete review below, in the "General comments for the author" box). Raw data about overhead measurements and collected traces are unavailable so authors could make them available thought some perenial archive such as Zenodo (from CERN). Content is not fully self-contained because in the Related work section authors refer to the VI-HPS (lines 97 and 98) using a URL link and not scientific references. The reviewer went to that website and fail to see "an extensive list of them [many software tools]". Perhaps citing the works individually would be better.

Experimental design

My concern here is about the variability of the results (see my complete review below, in the "General comments for the author" box). The method about the overhead analysis is not fully described. How the experiments have been conducted: were they randomly sorted prior to execution? What is the experimental design (using the Jain's book from 1991 terminology). ISBN: 978-0-471-50336-1.

Validity of the findings

Although the software has been provided in a lab's gitlab installation, no raw data (traces, measurements, scripts) is available. See my other comments about the validity of the findings in my complete review below, in the "General comments for the author" box.

Additional comments

The article "Further Enhancing the in Situ Visualization of
Performance Data in Parallel CFD Applications" presents results on
using scientific visualization tools such as ParaView to depict
performance metrics collected from CFD codes. The work is possible
through a connection between a Score-P plugin, capable to collect
performance data, and Catalyst, from ParaView, for visualization. The
work described here is incremental because previous author's work have
not dealt with the mapping of communication performance metrics and
the simulation's geometry. The base idea relies in combining the 3D
visualization of the computational resources with the performance
metrics of the application executing on that platform. Validation is
carried out with MG and BT from NAS, as well with Hydra, which is okay
because it has both known benchmarks and a real HPC application.

_Related Work_

The same authors themselves have already published articles with very
close contributions to the topic of this submission, including the
usage of same figures (that are repeated from these previous
publications), here:

2019
https://link.springer.com/chapter/10.1007/978-3-030-48340-1_31 "In
Situ Visualization of Performance-Related Data in Parallel CFD
Applications" Rigel F. C. AlvesEmail authorAndreas Knüpfer

and here:

2020
https://superfri.org/superfri/article/view/317 "Enhancing the in Situ
Visualization of Performance Data in Parallel CFD Applications Rigel
F. C. Alves, Andreas Knüpfer" http://dx.doi.org/10.14529/jsfi200402

The difference of this submission against those ones is basically
stated in a paragraph in the introduction. In 2019, authors map
metrics from code regions. In 2020, authors map metrics from
communication operations. In these past contributions, performance
metrics are always mapped to the simulation's geometry (the object
being simulated by the HPC application). In this submission, authors
replace such a simulation's geometry by an 3D object that depicts
the computational resources.

The idea in itself (performance metrics on top of a 3D representation
of the computational resources) remains without novelty because
several other works in the past have already proposed such
features. Some of them are cited in this work (second paragraph), but
authors fail to state the differences of this submission against them.

Regarding the limitation of Score-P/Vampir about application's
iteration tracking, paramount to many performance analysis activities,
have been alleaviated by manually instrumenting the application code
and inserting a region called "Iteration". Thanks to the fact that
regions can be stacked, with a bit of data science manipulation, you
can classify all events per iteration in a post-processing
phase. Refer to these publications were this method has been already
employed:
1. Temporal Load Imbalance on Ondes3D Seismic Simulator for Different
Multicore Architectures Ana Luisa Veroneze Solórzano, Philippe
Olivier Alexandre Navaux, Lucas Mello Schnorr.
http://hpcs2020.cisedu.info/4-program/processed-manuscripts
2. Optimization of a Radiofrequency Ablation FEM Application Using
Parallel Sparse Solvers. Marcelo Cogo Miletto, Claudio Schepke,
Lucas Mello Schnorr.
http://hpcs2020.cisedu.info/4-program/processed-manuscripts
So, when you write "We will then advance the state-of-the-art by
introducing tracing per time-step itself." note that many others have
already been doing a very similar operation (from the implementation
perspective) as a data science procedure.

_Results_

Generally, the 3D views on Figures 6, 7, 8, 9, 10, 11 are too
dark. They should have a white background or a more light background.

The compute nodes id numbers in the right-hand side of Figure 6, when
you write Line 216-217, is a gradient color, but understand that there
are no node with id number 1.5 or 2.5 (as shown in the respective
legend). Perhaps using fixed colors such as the left of Figure 6 to
identify compute nodes would be better in a small scale scenario
relying on gradient colors only when you don't have enough fixed
colors to depict (with some sort of threshold for this). In a general
sense, I had some trouble understanding this association (gradient
color and id number), so perhaps adding some manual annotation in the
figures with arrows and letters could help the reader to identify
precisely what you want the reader to see, or stating "gradient color
legend" in the text would help. About the fixed colors, the order on
which they appear in the legend should be the same order they appear
in the 3D object, so, from top-to-bottom, Machine, Package, L3, L2,
L1d, Core, PU. It is clear here that you disregard hyperthreading in
the visualization, but probably you are representing only physical PU.

The "Results" section (Sec 3.2) presents the 3D views and discuss
their capabilities. Most of section 3.2.1 (NPB-oriented) presents
technical aspects of the visualizations themselves to make the reader
understand and interpret them (the choice of depicting outwards
messages downward in the view, the choice of depicting compute nodes
as places and their inherent internal struture from the HW hierarchy,
and so on). The only part the authors do the performance analysis
activity itself appears by the very end of Section 3.2.1 (Lines 253,
254), and in Section 3.2.2 (Hydra-oriented). My general feeling about
the paper, regarding the so-called "results", is that it presents more
technical aspects of the 3D views, showing more "Hey, look how the 3D
views work" instead of *using the 3D views to actually do some
performance analysis work*, which you have briefly attempt on those two
parts I mentionned above. What I would suggest is that the 3D views
should be employed in an exploratory and them a specific performance
analysis to clearly identify a performance problem. Please note that
in such type of BSP applications - with a bunch of iterations with
compute/communication phases - performance problems are generally
anomalies in a few ranks. How would you detect these anomalies in
Figure 10 for instance?

In all the views you have show, you select "an arbitrary
time-step". So, your views are iteration-oriented. It is very probably
that the accumulated metric you are depicting hides important
flutuaction (delays in only a subset of ranks, etc) in the metrics, so
I wonder how would you cope with the "temporal aspect" of such
metrics. Vampir (and much of the other tools focused on MPI/OpenMP/GPU
realm) has a timeline, it is in the core of the tool. Here in these 3D
views, such a timeline is absent. How do you see the temporal
evolution them? How do you compare two iterations? Or several? Or a
trend that appears in several iterations? Such types of limitations of
the views should be more clearly stated or developed.

_Overhead_

Baseline refers to the "pure simulation code" and then you present the
time with the plugin and with score-p only (Tables 1 and 2). The first
thing I don't grasp is how the execution can be faster (-1%) when you
have score-p. That does not make sense because you are doing more
things. I'd suggest to correctly incorporate variability analysis
(check measurement distribution, standard error based on the verified
assumption of a given type of distribution, etc) in your
interpretation. For example, in Table 2 you have "-1%" of MG but your
baseline has a standard deviation of +- 2%. So, that "-1%" might not
be significant after all. So I agree in part when you write "the
plugin’s or Score-P’s footprints lie within the statistical
oscillation of the baseline results" but I find those percentages
weird because they do not reflect the variability of the plugin or
scorep measurements but their relation with the baseline. I agree "in
part" because the running time of Hydra is 12% slower than the
baseline. This should be more clearly marked, helping to extend the
runtime overhead analysis that is. in its current state, too brief
(only three lines 304-306).

_Notes_
- There is an excessive usage of footnotes (see Pg11/15 for
instance). Footnotes usually break the reading flow so I'd recommend
either incorporating your discussion in the text (if they are
paramount to understanding) or removing them.

_Summary_

The paper is interesting in itself because it is a new effort in 3D
visualization of performance metrics. This comes after several other
efforts targetting the same goal in past years. I wonder why those
efforts do not appear more often in papers that focus on the
performance analysis of HPC applications instead of proposing
methods. I pay attention to the fact that very frequently 3D views
appear in related tools but they fail to actually be useful in a
realistic performance analysis procedure. So, my concerns are:
1. discussion of the differences against related work (other attempts
that depict very similar views of this contribution's topology
view)
2. the usefulness to clearly detect performance problems and then,
after the fix, use the same visualization to show that the problem
is gone.

Reviewer 3 ·

Basic reporting

The basic reporting needs to be improved in several wahy

Writing
----------

Overall the text is readable and mostly understandable. However, it needs significant wordsmithing to make it actually easily readable. It contains a lot of text in parentheses that makes reading less easy. Sometimes formulations are not optimal and order of words is wrong, see e. g.
- line 20: "added to such plugin"
- line 88: the developer is referred to with the pronoun "it"
- line 123: "The objective aimed by this ..."
- line 139: "to the simulation original geometry" -> add 's
- line 182: "whereas BT for 1000"
- line 182: "The plugin would generate". Why "would"? Does it or does it not?
- line 256: "generate automatically" -> adverbs usually should be in front of the verb not behind it
- line 256: "onto a file" ... more commonly "into" than "onto"
- line 264: What does "Inclusively" mean here?
- line 295: "somewhen" is archaic. Use "sometime".
- lines 326: "do not use to be"

It appears a bit strange to me to read the judgement "a fine example" if the authors refer to their *own* previous work. (line 34)

I found a *central* part (lines 65 to 79) of the introduction not to be well well understandable. What does the "flipping" mean in detail? What does "performance ... inside in situ" mean in contrast to "in situ inside performance"?

The statement in Line 166 is at the section level and describes what will be discussed on the subsection level, while the related statement in line 206 is on the subsection level and describes what is discussed in the subsection. This is inconsistent.

What does "in the hyperlink" in line 99 refer to?

I was surprised to see that acknowledgements section is mentioned (line 86) in the description of how the paper is organized. I have never seen that before and I would remove it.

In line 143/144 the paper talks about "the generation of ... files ...(by means of the VTK ...". Ar the files really *generated* by VTK or are they *visualized* using VTK?

Finally, stating that one has "extended our software" in the first line of the conclusion does not seem appropriate. The software should be explicitly named here. In other words: Which software?


References
---------------

The related work section appears to list relevant overview literature.

At some other places in the paper, however, I am missing references supporting specific statements and claims:
- Line 50: References supporting the claim "for decades" are needed.
- Line 97: A reference where to fin the "Tools Guide" is needed.
- In line 177: "modified Class D" should be explained or a reference explaining it needs to provided.

Figures
----------
In general the figures demonstrate the proposed approach well. Unfortunately, they are not always as clear as possible.

My main problem with the figures is, that it is often hard know which color bar shows the scale for which of the visualized data. This has several reasons.
First, the coloring used is sometimes the same for different quantities but the color bars are only shown in the corners of the images. A relation between color bar and the parts of the visualization where they are used is thus hardly possible without very carefully reading the text. Additionally, most of the needed descriptions are only provided in the running text and not in the caption.
Second, the meaning of the values at some of the color bars is unclear. E. g. the switch_id for example has values between 0 and 1.2e-38. I assume that bot ends of the scale are simply referring to the same zero value. But this is obscured by the values next to the color bar.

Experimental design

The very basic experiments appear to be well designed. Two common benchmark applications as well as one industry application are used to illustrate the software's performance and memory overhead as well as the resulting visualizations.

An evaluation of or even expert comments on the usefulness of the presented approach is missing entirely.

Thus the paper demonstrates that the approach can be used but not that is actually useful. A discussion of how the new approach/tool helped or even could help an expert user is needed

Validity of the findings

no addition comments

Additional comments

The paper presents an interesting approach/tool for enhanced performance analysis in highly parallel computation application. Demonstration of the usefulness and the writing need to improved to make the paper ready for publication. To allow for these improvements to be made, I recommend a major revision.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.