Review History

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.


  • The initial submission of this article was received on November 11th, 2020 and was peer-reviewed by 3 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on December 14th, 2020.
  • The first revision was submitted on April 13th, 2021 and was reviewed by 3 reviewers and the Academic Editor.
  • A further revision was submitted on May 20th, 2021 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on May 28th, 2021.

Version 0.3 (accepted)

· May 28, 2021 · Academic Editor


Congratulations to you and your team on a job well done! The reviewers and I appreciate your detailed responses and your work.

Version 0.2

· May 10, 2021 · Academic Editor

Minor Revisions

Thanks for the revision! The reviewers are quite happy with the revision and appreciate your work to address their concerns. However, they had some minor comments. Please address them in an updated minor revision.



Basic reporting

I suggest proofreading the article. Some typos need to be fixed before publication. It is probably because of the "diff" document, but please, revise accordingly.

Just some examples:
Line 390 -> GrimoireLaballows - GrimoireLab allows
Line 301 -> Percevaland - Perceval and
Line 426 and 428 -> Gerrit efficiency) - remove a bracket
In this section we discuss - ","
Table 6 need to be formatted as the other tables

Experimental design

Section 2 was improved to better describe the proposed software architecture.

Validity of the findings

The authors properly improved the examples and usage scenario. Also, the replication package was updated.

Additional comments

I am grateful for the authors 'efforts to reorganize the article and address it according to the reviewers' suggestions. I have no more suggestions for improving the article.

Reviewer 2 ·

Basic reporting

I thank the authors for their effort in addressing all comments of the major revision. The two main problems related to the presentation were thoroughly addressed, the description and motivation of the Grimoirelab’s components were improved, and a new case study to evaluate Grimoirelab’s performance was presented. Some minor specific points are still unclear. Below I describe each of these specific points:

1. The authors improved the motivation for Grimoirelab, in specific by adding Table 6 and a sentence to the introduction. The sentence in [L114-119] still lacks a clarification of how Grimoirelab “improve the situation in solving the practical problem of retrieving data from software development repositories, preparing it for further analysis, and providing basic analysis and visualization tools help in exploratory studies”. Is this due to the holistic approach taken by the tool? In the description of the features shown in Table 6, please clarify the meaning of the “selected projects” scope. Does the “collection of projects” need to be in git repository? What is the difference with respect to “GitHub subset”?
2. If possible, would be preferable to adopt a consistent notation for the arrows throughout the figures that explain Grimoirelabs’ architecture. For example, the dashed arrow is used to represent invocations in Figure 6, workflow in Figure 7, and “entities flow” in Figure 10.
3. I believe that a brief description of the “conservative approach” to identity management, as well as the examples given in the response letter, would be of great aid to readers.
4. Typos:
a. [L230] missing closing parenthesis.
b. [L390] “GrimoireLaballows”, “Percevaland”
c. Caption of Figure 10: “sold arrows”
d. [L731] “the the”
5. Sentences that could have grammar revision: [L156-157] “… and may need some other components to work, which in that cases are installed as dependencies of it.”

Experimental design

Please, see the comment in "basic report". The new version of the manuscript improves the experimental design by adding a new case study to evaluate the performance of the tool.

Validity of the findings

Please, see the comment in "basic report".


Basic reporting

See previous review.

Experimental design

See previous review.

Validity of the findings

See previous review.

Additional comments

The authors have diligently addressed almost all the review comments. As a result, the paper now clarifies many important issues. I do not think that another review round is required. However, the authors might want to take the following comments into account when revising their manuscript.

The figures are now more readable. The Perceval class diagram is useful,
but I don't think that UML class diagrams are the most appropriate for
illustrating the other tools. I would still have preferred to see
them follow a UML notation (e.g. that of collaboration diagrams) rather than
the invented ad hoc ones, but I think that this choice can be left to the

The system's performance is now nicely illustrated in Table 5. The numbers
for enriched issues and merges may indicate a pathology regarding I/O
operations; you might want to investigate why clock time is two orders of
magnitude higher than processor time only for these specific two use cases. Could caching or better indexes help?

Table 6 now provides a very useful summary of related systems. Consider
adding the corresponding references below the headings or in a separate row at the end.

Line 898,899: Consider rephrasing as "GrimoireLab provides a simple way
of getting the data needed for *a* study in reasonable time"

Version 0.1 (original submission)

· Dec 14, 2020 · Academic Editor

Major Revisions

Overall the reviewers are quite positive about the paper and the work. All reviewers as well appreciate the contributions of the authors to the community!

Nevertheless, the reviewers raise a good number of concerns that are mostly related to the presentation, writeup and flow of the paper content. Also, reviewer 3 pointed out concerns about the installation not working and the need for a replication package. Please address these concerns and prepare a detailed rebuttal that explains how you went about addressing these concerns.


Basic reporting

Here are my concerns:

- Introduction. First Paragraph. "They showed how data retrieval..."; "They explored the
48 limits to scalability"; etc. -> Who is the subject that the authors want to refer to when using "They"? "The papers cited? Previous Work? Make clear this aspect in the paragraph. Maybe the entire paragraph can be rephrased.

- Introduction. It would be nice to have the aim of the paper described more clear. From my perspective, the goal is to present the infrastructure and tools, to provide architectural details about it, and also some "usage scenario - proof of concept" to validate the architecture. Making the aim clear, the authors can better "set" the manuscript expectations for the reader since during the manuscript there are a lot of technical terms and examples. The goal can be better defined in the abstract as well.

- GHArchive Grigorik (NA) - ???, SARA (NA) - ???? - It seems that some references are missing. Please, revise all the references in the manuscript.

- Figures need to be improved. The first eight figures are large with large icons. Figures 9 to 14 have some colors that cannot be visualized by colorblind persons. The different types of icons used are not explained in the text. I also recommend improving the image quality (some parts are blurred).

- 1.3 Contributions. -> The fact of having an open-source community around the Grimoire Labs, would be stated as a contribution?

- Section 2.1 and 2.4 -> Must have a general explanation about the data retrieval component as we have for sections 2.2 and 2.3.

- Lines 133 to 143 must be described together with the Perceval explanation (line 125).

- Section 4 - Table 4. "Retrieval (github) 10:02:12 hours 160 items/min" -> Itens means, pull request/issues and their messages? Commit Metadata? - Please clarify in the text.

- EXAMPLES OF METRICS AND VISUALIZATIONS Section -> The authors provided examples and explanations about metrics and visualizations available. My concern here is: Is it possible to create new metrics and visualizations? How can the users extend it? It would be worth having some instructions and explanations.

- Ethical issues. Recently, GHTorrent lead with some ethical issues (The #issue32 incident) - I think that this topic deserves some special discussion in this paper. Is there any concern/issue/implication for researchers/practitioners when using GrimoireLab?

- The paper description has too many bullets (e.g Related Work, Components description, Discussion, conclusions, etc). I suggest reviewing the manuscript to avoid this practice that breaks the text fluency.

- Line 343 - GrimoireLabis - GrimoireLab is
- Line 378 - Bugzilla timing9): -> Exclude )
- The term "experiment" is used to describe the case studies/scenarios

Experimental design

No comments. The manuscript describes the Metrics Grimoire infrastructure. The infrastructure is validated with a ``Proof of concept'' presented by describing some usage scenarios.

Validity of the findings

No comments. The validation was described by presenting some usage scenarios, by providing the replication package and access to the tools and source code.

Additional comments

The paper presents GrimoireLabs, a toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. The toolset has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. The components and infrastructure are presented. The authors used some usage scenarios to validate the presented toolset.

In general, I liked the paper idea. As a Researcher, I also would like to thank the authors for their effort to built this set of tools and the infrastructure. A scenario replication is available, which is really good for practitioners and researchers. However, the paper needs to address presentation issues (image quality, too many bullets, etc). It is also worth to include discussions about some potential Ethical issues and Metrics/Visualization extension.

Check my comments to improve the paper manuscript.

Reviewer 2 ·

Basic reporting

Comments about basic reporting are done in the "General comments for the author" field.

Experimental design

Comments about experimental design are done in the "General comments for the author" field.

Validity of the findings

Comments about the validity of the findings are done in the "General comments for the author" field.

Additional comments

# General comments

This paper presents GrimoireLab, a set of components to data retrieval and analysis of software repositories. The paper describes the structure and functionality of the main components of GrimoireLab (data retrieval, data storage, identities management, analytics, and orchestration). The paper presents real-world use cases of the proposed components, as well as examples of metrics that can be calculated and visualizations that can be generated by GrimoireLab. The paper also provides a discussion of challenges that need to be addressed to collect and analyze software engineering data in both the industrial and academic settings. The strongest point and greatest contribution of the paper is a set of components that were developed during many years of research effort and have matured as open source projects. As for the weakness associated to this paper, it is possible to highlight the lack of validation of the claims performed as discussion points. Below I provide specific comments for the overall paper presentation of for each section of the paper.

# Overall presentation

In general, the paper is adequately written but a few clarifications are needed, which I specifically point in the detailed comments for each section. Regarding presentation, there are two major issues.

- The first issue is the excessive usage of pronouns with an unclear reference (i.e., antecedent noun), which yields too many ambiguous sentences. For example, in the sentence “They showed how data retrieval …” [P2,L46], the antecedent noun is not clear (does “they” refer to repositories or tools?). Occasionally, the antecedent noun can be inferred from the context, however this is not always the case. Other examples include: “They explored the limits of scalability …” [P2,L47], “They demonstrate …” [P2,L48], “Some of them …” [P2,L56], “Most of them …”[P2,L63], “They use …”[P2,L64], “… of how it …” [P3,L105], “… that is can provide.” [P3,L106], “… contained in it.” [P8,L239], “… from them …” [P8,L243], “… these cases …” [P8,L245], “… visualize it.“ [P9,L277], “… all of these scenarios …” [P11,L304], “.. some of them,” [P11,L308], “… its analysis starts,” [P12,L318], “… all its items …” [P12,L318], “… depending on it.” [P18,L405], “… deals with this …” [P18,L411]. Many other similar cases appear throughout the paper.

- The second issue is the occurrence of vague descriptions of rather important information. For example, in the sentence “When the data to be collect is really large …”, it is hard to judge what “really large” is. For such technical descriptions, the text should be as precise as possible. Other examples include: “… some idea …” [P11,L312], “… for a little while …” [P12,L319], “… every few minutes.” [P14,L336], “… some data …” [P14,L338], “… some custom Kibana visualizations …” [P14,L351], “… key data …” [P14,L357], “… some exceptions …” [P16,L365], “… some fields …” [caption of Figure 19], “… in some cases …” [P20,L504], [P20,L507], “… some other goodies …” [P21,L561], “… some if its properties.” [P21,L577].

- Typos: “… we will us …” instead of “… we will use …” [L111], “… HatStallThen,” instead of “… HatStall. Then,” [P9,L277], “Data collection for “git” Includes …” instead of “Data collection for git includes …” [P12,L324], “GrimoireLabis …” instead of “GrimoireLab is …” [P14,L343], “… more indexes is produced …” instead of “… more indexes are produced …” [P16,L364], “… the should fixed …” instead of “they should be fixed …” [P19,L26], “… kinds consumption…” [P19,L480] instead of “… kinds of consumption …”, “This make their …” [P20,L508]

- [P20,L494-502] The usage of the word “you” in that context is discouraged and inconsistent with the rest of the paper.

- Unclear sentences: “… consult analysis and visualizations of it” [P2,L56], “… specifically to retrieve source code or data” [P2,L59], “Interlaced with this process, SortingHat process identities found.” [P12,L320], “For many data sources, this is software not difficult to write …” [P19,L26], “… plugged to the data.” [P19,L447]

- Citation format: all citations are in the format “Author et al. (Year)”, while in many cases the correct format seems to be “(Author et al., Year)”

# Introduction

Main point: The motivation for another system to collect and analyze data from software repositories is not well described. Although the Introduction reviews relevant literature, there is no clear description of the motivation for a study on another related system. For example, while the Introduction says “In many cases they are not easy to reply and operate, and in others they are difficult to use for large-scale, continuous data retrieval. Not all of them provide support for retrieval, storage and analysis of the data …”. It is not clear which of these limitations GrimoireLab addresses. Also, there is no description of how and why the approach taken by GrimoireLab to tackle those limitations is better than the approach taken by other systems that also tackle these limitations. Most importantly, the Introduction describes no empirical evidence that GrimoireLab can overcome previous’ systems limitations. I would suggest having a table that compares the features and functionalities of the related systems compared with the features and functionalities of GrimoireLab.

[P1,L42] The definitions of “… tools and complete systems …” it is not clear what the difference is. Is GrimoireLab being proposed as a tool or complete system?

[P2,L49] The sentence “… different approaches to avoid harming the project hosting systems …” needs clarification. What are those approaches and what type of harm are they trying to avoid?

[P2,L75] Although the types of repositories whose data collection is supported by GrimoireLab are defined in later sections, it would be interesting to define (or perhaps give examples of) different types of repositories in the Introduction.

[P2,L80-82] Is the claim of efficiency validated in the paper? It would be nice to describe how efficient is GrimoireLab compared with other solutions that are mentioned in the Introduction.

[P2,L84] It would be nice to motivate the identity management module upfront in the Introduction. When “module for identity management (…) in combination with custom code to merge or tag identities” is mentioned for the first time, it is hard to understand what this module does. Also, if this feature is unique for GrimoireLab, this should be highlighted.

[Section 1.3] The list of contributions reads more like a list of features of GrimoireLab. Beyond GrimoireLab itself (which is a great contribution), I would expect to see other contributions in terms of 1) novel techniques to investigate the efficiency and efficacy of GrimoireLab in coping with the described challenges, or 2) novel findings or results of applying GrimoireLab in the field, 3) or data that allows the comparison between GrimoireLab and related systems. Moreover, Section 1.3 seems like a right place to thoroughly describe the differences and similarities between GrimoireLab and the related systems.

[Section 1.4] Some definitions appear after their first usage (e.g., data source, kind of data source). I would suggest having a structured glossary in this section, with a formalization of all definitions used in the paper. For example, the definition of “kind of data source” is imprecise, as the data sources given as examples have a notable different API. Also, readers that are less familiar with mining of software repositories would benefit from having a definition of related terms.

# Section 2

Main points: 1) There is no discussion of the rationale behind the existence of each component. The paper describes the functionality of each component. To some degree, the relationship between these components is also described. However, except for the identity management component, it is unclear what challenges each component solves, both in terms of structure (e.g., why is this specific architectural design chosen?) and functionality (e.g., how a component overcome the current limitations of data collection and analysis?). It is also unclear how GrimoireLab addresses the flaws (in terms of design, efficiency, or efficacy) of related systems. 2) Given the paper's descriptive characteristics, I would recommend having a formal notation (e.g., UML) to describe the components and their relationships (Figures 1-7 and Figures 10-14). For example, the meaning of the relationship shown between components (Figures 1-7) is not defined (does the arrow represent data flow? functional dependency? or something else?). Also, although some colour and shape code is used in Figures 9-14, the meaning of such codes is unclear. 3) There is no discussion regarding the separation of concerns among components.

[P3,L104] Could one interpret the sentence "... describes the structure of GrimoireLab and its module ..." as "... describes GrimoireLab's architecture ..."? If so, which architectural level the paper describes? Also, the text interchangeably mixes the words "structure" and "design". Please, clarify if they have the same intended meaning.

[P3,L128-131] "Graal runs third party tools on git repositories.", "... runs a collection of tools on checkouts ...", "... produced by those tools ...". The mentioned tools need to be clearly defined, at least what are their outputs.

[P5,L150-159] Please, clarify that are the "common features" that are reused to collect data from different data sources and which components implement those features.

[P5,L164] "... controlling the details of the job ...". Which are the controlled details and why?

[Perceval and Graal] Perceval and Graal components seem to have a strong coupling. Please, clarify how they are typically integrated.

[Section 2.3] The motivation for the identity management component is well described. However, I would recommend addressing the following points: 1) There are many academic papers describing different heuristics for identity disambiguation (e.g., resolving users that adopt different e-mail addresses to commit to the codebase). Different heuristics are associated with different performance (e.g., accuracy). How does GrimoireLab's approach to identity disambiguation compare with related work in this area? What is the performance of GrimoireLab's approach? 2) "... sends them back to be added to the enriched data." [P6,L202] is not clear in Figure 7. Also, by Figure 7 it is unclear how "enriched data" relates to other sub-components of SortingHat. In general, the textual description of the figures needs to be improved.

[Section 2.4] Although the section states Visualization and Analytics, what is actually described relates to visualization and data exporting. Also, please clarify which "documents" are referenced in [P7,L213] and what "actionable inspection" [P7,L216] means in this context.

[Enriched indexes] The concept of "enriched index" needs to be clarified and better formalized.

[Figures] There is excessive usage of Figures that hinders readability. While I appreciate the effort to describe the relationship between GrimoireLab's components incrementally, many of the figures are redundant and can merge into a single one that conveys the same message. Also, the textual description of many figures (e.g., Figures 4 and 8 ) is lacking.

# Section 3

Main point: this section lacks a detailed description of the techniques used by each component. Also, it seems that a large portion of the section's contents overlaps with the previous section's contents (in particular, most of the section describes the relation between components and how they individually work). Please, consider improving the separation between the sections.

[Section 3.1] What are the advantages of using Perceval compared to having a script that uses CURL with pagination, for example? Also, does Perceval collects only issues and pull-requests from GitHub? If that is the case, the paper needs to discuss this issue as a limitation of GrimoireLab. More generally, although the system can collect data from more than 30 data sources, not all data from each source can be collected.

[P8,L242] Section 3.1 describes a data retrieval scenario and claims that "... which will use Perceval (as library) in the background, to clone the repository and get the list of commits, and then run third party tools that analyze each relevant file in each commit to get complexity metrics from them in a single JSON ". It seems that Perceval is doing both data retrieval (e.g., cloning repositories and getting commits) and data analysis (e.g., calculating complexity metrics). I wonder how much this characteristic of GrimoireLab (specifically, Perceval) is based on best practices. For example, why does the same component perform retrieval and analysis? As these are different concerns, it seems that different components should address them. Another question raised by the quoted sentence is how "relevant files" are set in Perceval.

[P9,L276] In the sentence, "SortingHat will work based on heuristics ...", what are those heuristics and how well do they perform?

[P9,L282-284] The technique adopted by GrimoireLab for incremental data retrieval needs to be way better described. For example, it is unclear from the sentence "... which can be used to select which items from the database need to be retrieved." [P10,L283] how such items are selected. Also, the sentences "... how often will data sources be visited for incremental retrieval." [P10,L287] and "... refresh periods (how often data will be retrieved incrementally from repositories)" [P10,L290] gives the impression that the incremental data retrieval is not real-time but instead performed in batches. If that is the case, how this feature compares with real-time techniques such as processing an event stream?

[P10,L294] The paper says that "... Graal analyzes source code, by running third party tools with the help of Perceval.". From Figure 2, it is possible to verify that Graal is within the "data retrieval" component. This observation evidences that the paper lacks a discussion on the separation of concerns of the components. Also, there is no illustration of the relationship between Graal, third-party tools, and Perceval in any of the figures.

# Section 4

Main points: I am surprised by how much Section 4 overlaps with prior sections. It claims to describe “three real use cases for GrimoireLab”, while the previous section claims that “GrimoireLab can be used in many different scenarios. In this section we describe some of them …”. Please, consider improving the separation of the contents described in Sections 2, 3 and 4. It seems that Section 4 has the potential to describe insightful empirical studies regarding GrimoireLab. However, there is no definition of research questions, no description of rigorous methods to perform such empirical studies, and no discussion of the learned lessons. The lack of a clear and well-thought study design leads to severe issues to the validity of the claims.

[Section 4.1] The main takeaway from the analysis presented in this section does not stand out. For example, how does the running time change by changing the associated factors (what are the factors in the first place)? Also, what is the baseline for comparison of the presented results? How good or bad are the presented results compared with other similar systems? What are the strengths and limitations?

[P11,L312] How is GrimoireLab’s performance evaluated?

[P11,L314] The sentence “The main figures of that experiment are included in Table 1”. I do not understand what “figures” means, neither which “experiment” is described. In addition, what is Mordred in the sentence "The experiment was run by Mordred"?

[P12,L319] "... sleep for a little while ...". This sentence is imprecise. Also, how much of the "little while" is reflected in the results shown in Table 2.

[P12,L320] "The data shown in this paper ...". Please, clarify which data.

[P12,P324] "Data collection for git includes cloning of git repositories for GitHub, and production of the raw index by analyzing those clones". How each of these steps is reflected in the results shown in Table 2?

[Table 3] There is no reference of Table 3 in the text

[Section 4.2] What is the main takeaway of Section 4.2?

[P14,L329] “Table 4 shows data about an example of large deployment of GrimoireLab”. What does “large deployment” mean?

[P14,L335-336] “The deployment has been running continuously for more than four years, retrieving data incrementally from repositories every few minutes”. How many minutes?

# Section 5

[P16,L365-367] “… metrics are not (except for some exceptions) a part of the indexes stored in Elasticsearch: they are computed either by the visualizations, or by the tools producing the reports.”. What is the rationale behind this design decision? Why is it better recalculating the metrics instead of storing them?

# Section 6

Main point: this section needs to improve the discussion of the novel learned lessons after years of research and development of GrimoireLab. For example, prior studies discuss the same pointed issues: “minimizing interactions with data sources”, “data source APIs are not uniform”, and “data provided are not uniform”. How good or bad are the solutions proposed by GrimoireLab? More importantly, which findings and evidence support the claims made in this section? Without clearly providing evidence to support the discussion, most of the claims are understood as speculation. Another important point is that, in general, there is a good amount of discussion about what the system does, but barely any discussion on how and why.

Some excerpts in this section deserve clarifications. Below I point to specific excerpts:

[P18,L405] “… do some different tricks depending on it”. Please, clarify what those “tricks” are and provide the motivation for using them. Preferably, use more specific and technical wording.

[P18,L414] “… the code may be more generic …”. It is unclear which “code” is being referenced.

[P19,L450] “… to address the first approach …”. Please, clarify which “first approach”, as the previous sentence was talking performance issues of data collection.

[P19,L450] “… to avoid those that could cause trouble.” Please, clarify what type of trouble. Also, please, prefer a more technical and specific wording.

[P19,L470-471] “GrimoireLab recovers as nicely as possible by retrying, and by continuing after failure”. Please, elaborate on how this feature is implemented. Is there any increase in the waiting time after retrying? How each type of failure is handled? Is the failure handling approach dependent on the data source?

[P19,L475-476] How can consumers detect errors in the retrieved data? What are those errors? What are the most common ones?

[P19,L477-478] What is a “bug in time interpretation”?

[P20,L494-502] Reproducibility: why providing a reproducible build of GrimoireLab instead of providing a dump of previously collected data? The argument that “that would be enough for anyone to get the same data again, provided data sources didn’t change” does not hold, as rarely a data source will not change.

# Section 7

[P20,L503-506] I do not understand what this paragraph conveys. For example, what does “production of merged collections” mean? Please, clarify the sentence “The availability of origin fields simplify the merge, allowing for complex combinations, such as using raw indexes for different data sources to compose a meta study”.

# Section 8

[P20,L530-533] How the two proposed categories of related work arose? Is this classification by the own authors? What process was used to achieve this classification?

[P20,L541-542] “… not only metadata about the projects, but as much data as possible about software development …”. Please, clarify what are the differences between “metadata about the projects” and “data about software development”.

[P21,L563-567] I appreciate the comparison between GrimoireLab and GHTorrent (same for the paragraph of [P21,L591-599]). While I agree that using the projects API will require a reduced number of queries to the endpoint, the events API (as used by GHTorrent) provides a richer source of data about the activities performed in a repository. Please, clarify which type of data can be obtained by using the projects API in comparison with the events API and vice-versa.

[P21,L576-577] “Boa is a system …” [L576] and “Boa is a programming language …” [L577]. What is Boa anyways?

[Lack of subsections] To improve readability, I would recommend separating Section 8 into sub-sections (per related work’s theme).

[P22,L628] Please, define the acronym “SQO-OSS”.

# Section 9

I appreciate seeing a mature and well-supported set of tools being provided to the academic and industrial communities of mining software repositories. Many of the provided tools are tested in-field and used to support different types of research, which clearly shows the value of those tools. As a side note, it would be a great addition to this paper if any lessons learned from the usage of GrimoireLab by the core services of a company could be discussed, even if the discussion stems from public information.

# Section 10

[P23,L685] “… an industry-strength platform …”. What evidence is given for this claim?

[P32,L687-699] This excerpt claims unique characteristics of GrimoireLab that differentiates these tools from others. However, in the list of characteristics, we can read “support of different data sources”, “flexibility and configurability”, “identity merging”, “tested in real-world”, etc. Unfortunately, none of these characteristics are “unique” to GrimoireLab and can be observed in other systems or academic papers. Perhaps, the unique feature of GrimoireLab is how all these characteristics are put together, which reinforces the need to present a strong comparison between GrimoireLab and other existing related systems.


Basic reporting

Overall the manuscript is well structured. The writing is clear with only few syntax, grammar, or spelling mistakes. I recommend carefully re-reading the references to fix some capitalization errors.

The numerous figures provided are helpful. I suggest using a standardized notation (such as UML) or explaining the meaning of the various used shapes (e.g. rectangle, vs ellipse, vs, rounded rectangle. Also, I recommend using a lighter color for the purpse shapes, which are difficult to read.

Extend Table 2 to include the time it took for each phase to run.

Suffixing the duration with hours/min in Table 3 is confusing. Provide all durations simply in HH:MM:SS units.

Table 3: Clarify what "items" refers to.

I recommend rephrasing the abstract's "Conclusions" section to state what GrimoireLab has achieved, rather than summarize the paper.

Experimental design

According to the framework provided by Stol and Fitzgerald (2018), this is solution-seeking rather than a knowledge-seeking study. Solution seeking studies aim to solve practical problems for which solutions to practical problems can be engineered. In such studies, researchers design, create, or develop solutions for a software engineering challenge; in this case by building the GrimoireLab tools. As such, I do not expect a research question filling an identified knowledge gap. Instead, I expect to see description of a tool advancing the state of the art.

The described system (GrimoireLab) advances the state of the art by supporting more data sources and analyses as well as enhanced interoperability. The study can benefit from a rigorous comparison between GrimoireLab and other existing systems in terms of performance, supported data sources, and analyses.

Two case studies, one on IoT repositories and one on Wikimedia Foundation repositories illustrate the power of GrimoireLab. These can be profitably extended to showcase what research questions can be answered through the provided quantitative data.

Validity of the findings

The provided conclusions mostly follow the conducted study. They can be extended through more rigorous benchmarking and more ambitious example studies.

The authors provide links to a companion package on Zenodo, which contains data, logs, and configuration files for the IoT case. I recommend that this should be expanded to include the Wikimedia foundation case, and also additional data regarding the recommended additions. Uploading a source code snapshot on Zenodo can ensure the source's long-term survival.

The provided online documentation is quite detailed and well-written.

I was able to install and run Percical using the provided instructions. On the other hand, the installation of Grimoirelab appeared to get stuck in a loop with lines such as the following.

Requirement already satisfied: setuptools in ./venv/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.7->graal==0.2.3->grimoirelab) (50.3.2)
Requirement already satisfied: wheel in ./venv/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.7->graal==0.2.3->grimoirelab) (0.36.1)
Requirement already satisfied: setuptools in ./venv/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.7->graal==0.2.3->grimoirelab) (50.3.2)
Requirement already satisfied: wheel in ./venv/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.7->graal==0.2.3->grimoirelab) (0.36.1)

The denormalized output of the grimoirelab tools seems to result in considerable waste. For example, each commit contains, again and again, the full details of its author and committer: about 49 elements. What is the rationale of this decision and what is its effect in terms of waster storage and processing cost?

Additional comments

L. 36: Your phrasing focuses on the process. Consider stating that tools provide data about the software development process and the developed artifacts.

L. 91: The described attributes refer to coverage breadth, rather than exensibility.

L. 129: Generally, running tools on a commit's checkout is needlessly expensive. Accessing directly Git's file-system is more efficient, because it avoids the cost of copying all files to the OS's file-system.

L. 244: I find it a pity that Graal uses Perceval internally, rather than allowing the user to compose various tools through a common protocol and data format.

L. 254: Consider providing an actual example, rather than an abstract description. How could one e.g. measure the evolution of comment spelling mistakes over time?

L. 490: I recommend backing up this claim with benchmark results.

L. 855: One citation to the smartshark ecosystem should be enough.

I recommend clarifying in the manuscript following questions

- How does GrimoireLab handle GitHub pull requests?
- What is the schema of the data provided by Perceval?
- Do SortingHat and HatStall offer an help to address GDPR requirements? How do you propose that the tools should be employed for handling them?
- L. 234 Perceval appears to retrieve the repository's metadata and also a clone of the actual repository. How does it handle clashes of repository names between repository names from different sources?
- How can GrimoireLab be extended? What does it take to create a new data source or analysis metric? How can such additions be contributed back to the community?
- What does an index hold? What operations does it facilitate?
- What are the practical limits of GrimoireLab in the number of repositories or data volume it can process? On what volumes has it been tested? What support is offered for running GrimoireLab on multiple hosts?
- How is GrimoireLab tested (c.f L. 508)? Are there some metrics (e.g. code coverage) that can help track the quality of its testing? Where do these stand today?
- How are the provided GrimoireLab Docker images supported and updated? Given GrimoireLab 's evolution how can GrimoireLab users ensure the long-term replicability of their studies?

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.