To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
I believe that the reviewers' comments have been well addressed.
The reviewers have raised some minor issues that I suggest the authors to address.
See general comments.
See general comments.
See general comments.
This article provides an informative discussion of the ReScience journal and its motivations. Created in 2015, ReScience publishes replication attempts of previous computational analyses. The initiative is noteworthy as a project that aims to improve the replicability of science and produce open source implementations of past closed science. In addition, the initiative was innovative in adopting GitHub as the submission, review, and publication platform. This design leads to unprecedented transparency in the review process and presumably helps encourages productive and actionable exchanges between reviewers and authors.
I do take issue with some aspects of ReScience's design. However, given that this article is a retrospective examination, modifying the past design is not feasible. Therefore, some of my comments should be considered advice for the future or discussion points to consider, rather than obligatory changes.
The following two statements are worrisome in that they seem to take a defeatist attitude towards computational reproducibility:
> In particular, if both authors and reviewers have essential libraries of their community installed on their computers, they may not notice that these libraries are actually dependencies of the submitted code. While solutions to this problem evidently exist (ReScience could, for example, request that authors make their software work on a standard computational environment supplied in the form of a virtual machine), they represent an additional effort to authors and therefore discourage them from submitting replication work to ReScience.
> the code newly produced for ReScience will likely cease to be functional at some point in the future. Therefore, the long-term value of a ReScience publication is not the actual code but the accompanying article. The combination of the original article and the replication article provide a complete and consistent description of the original work, as evidenced by the fact that replication was possible.
It is highly problematic if ReScience replications are not reproducible (i.e. they cannot be rerun by other researchers in the future). First, enhancing a replication will become difficult. Software and data analyses are never fully complete. Contributions to a replication may continue indefinitely. Furthermore, the replications will often contain useful open source implementations that are adaptable to new problems. Reproducibility will make repurposing the code more feasible. Finally, the value of the initiative is diminished if independent parties cannot verify a replication. Therefore I think it's a bit prescriptive and shortsighted to consider software reproducibility as a secondary concern.
While there is not a one-size-fits-all solution to reproducible computational analyses, there are many approaches that help remedy the situation. For example, the shelf life for software that makes no attempt to control its environment is often only a few months. However, even just specifying the versions of the explicit dependencies, say as a conda environment, could increase shelf live to several years. Alternatively, containerizing the entire computational environment, e.g. via Docker, could increase the shelf life to decades. I believe that ReScience should strongly encourage controlled environments and dependency versioning. For example, as a code reviewer, I would in most cases refuse to approve a data analysis codebase without versioned inputs and software.
On a related point, the study does not cite continuous analysis (https://doi.org/10.1038/nbt.3780). Continuous Analysis (controlled environments with automated execution via continuous integration) would potentially be very helpful for ReScience repositories. It would help ensure the reproducibility of analyses and traceability of the results. Continuous Analysis can be quite complex to configure, so this is mostly a suggested reference and consideration going forward.
> It offers unlimited free use to Open Source projects, defined as projects whose contents are accessible to everyone.
When discussing the pricing model of GitHub, I recommend writing "It offers unlimited free use to public projects" rather than stating "to Open Source". It's important not to further the misconception that public repositories without an open license are open source. Currently, GitHub's [pricing page](https://github.com/pricing) states:
> GitHub is free to use for public and open source projects. Work together across unlimited private repositories with a paid plan.
Can ReScience implementations use code from the original study or are authors required to entirely rewrite the code to encourage true replicability?
The study is lacking a data-driven summary of ReScience's past operations. I'd recommend tables and/or figures that express ReSceince's history up to this point. For example, what about a table with all ReScience articles thus far? The table could show the date and info of the original study as well as link to the ReScience replication. Additionally, some quantitative measure of how much total review (in the form of GitHub comments) has occurred would be informative. What's the distribution of codebase sizes, language choices, study lifespans, etcetera, across ReScience articles? In short, this article could do a better job providing quantitative measures of ReScience's historical growth and mechanics.
To close, my favorite quote from the study is:
> The holy grail of computational science is therefore a reproducible replication of reproducible original work.
The abstract names a certain 'James Buckheit'. It should be rather 'Jonathan Buckheit'.
This is a superbly lucid article.
It describes a project that is exemplary and inspiring.
Because of the unusual nature of the article -- it describes a new scientific publishing effort -- the usual PeerJ reviewer checklist is not appropriate.
I have not published at ReScience (But now I intend to!) and so I cannot vouch for the veracity of any of the statements about the procedures. The description is very clear -- I wasn't even able to spot typos, except for the issue mentioned in line one of this review.
The intellectual principles of science are modeled in an exemplary way by this paper.
So as a scientific project this is first-rate. It is not however relevant to judge this by the PeerJ experiment design checklist.
This paper is necessarily a list of experiences setting up a new scientific publication process.
I am not able to judge the literal truth of the claims about the journal and its operations, however, the description makes me believe it all.
Perhaps someone among the reviewers actually HAS used ReScience and CAN say that things actually work this way.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.