Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Code4ML: a large-scale dataset of annotated Machine Learning code

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on October 5th, 2022 and was peer-reviewed by 3 reviewers and the Academic Editor.
The Academic Editor made their initial decision on December 14th, 2022.
The first revision was submitted on January 6th, 2023 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on January 9th, 2023.

Version 0.2 (accepted)

Stefan Wagner · Jan 9, 2023 · Academic Editor

I checked all reviewer comments in detail and you have improved the article accordingly. Well done! The paper is now ready for publication.

[# PeerJ Staff Note - this decision was reviewed and approved by Rong Qu, a PeerJ Computer Science Section Editor covering this Section #]

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Jan 6, 2023

Version 0.1 (original submission)

Stefan Wagner · Dec 14, 2022 · Academic Editor

Minor Revisions

The suggested changes from the reviewers are presentation issues. Please improve the following:
- Describe and summarise more clearly the comparison to existing data sets
- Strengthen the motivation for creating and publishing a new data set
- Discuss further uses of the data set
- Discuss in more detail the limitations of the data set

Reviewer 1 · Nov 24, 2022

Basic reporting

The research background is not enough. The authors should add stronger research motivation for contributing a new dataset.

Experimental design

The knowledge gap between the existing datasets and this new dataset should be summarized and added in the Introduction section.

Validity of the findings

The authors should enhance the impact of their proposed dataset by pointing out more potential applications as the future research directions, as specific as possible, e.g., with input and output description.

Additional comments

Dear authors, overall this is a good study. My concerns are about two questions:
1. Why is a new dataset needed?
2. How could this new dataset be used in the future?
For the first question, the authors may want to add more discussion in the Introduction section. For the second question, the authors may want to add more discussion in the DOWNSTREAM TASKS section.

Cite this review as

Anonymous Reviewer (2023) Peer Review #1 of "Code4ML: a large-scale dataset of annotated Machine Learning code (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.1230v0.1/reviews/1

Reviewer 2 · Dec 9, 2022

Basic reporting

The author introduced a novel technique for ML code annotation based on a Machine Learning Taxonomy Tree that reflects the main steps of the ML pipeline. Additionally, the author provides an annotation tool that can help continue further markup of the dataset.
From a journal perspective, the contribution is minor.

Experimental design

This paper is focused on a large-scale dataset of annotated Machine Learning code. More technical revisions are required in this paper about the Algorithm and Flowchart of the proposed methodology. The experimental section needs to be clarified.

Validity of the findings

Data and Analysis experimentation-based findings need to be revised. For example, different validation techniques need to be applied. Also, the author needs to show a comparison with these schemes.

Additional comments

The Paper needs the following Major Revisions:
• The limitations of the existing methods proposed for similar tasks should be pointed out and explained in how the proposed strategy would address each limitation.
• The results are better. But from a technical perspective, it is mainly a combination of existing technologies, and the contributions could be more evident. The author needs to highlight the contributions of this paper in the revised version.

Cite this review as

Anonymous Reviewer (2023) Peer Review #2 of "Code4ML: a large-scale dataset of annotated Machine Learning code (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.1230v0.1/reviews/2

Amogh Raghunath · Dec 12, 2022

Basic reporting

- Language is clear and unambiguous
- Literature can be improved a bit.
- Structure, facts and figures are apt, no changes required.
- Results are sound enough.

Experimental design

Not changes required.

Validity of the findings

Good enough, no changes required.

Cite this review as

Raghunath A (2023) Peer Review #3 of "Code4ML: a large-scale dataset of annotated Machine Learning code (v0.1)". PeerJ Computer Science https://doi.org/10.7287/peerj-cs.1230v0.1/reviews/3

Download Original Submission (PDF) - submitted Oct 5, 2022

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Code4ML: a large-scale dataset of annotated Machine Learning code

Summary

Version 0.2 (accepted)

Stefan Wagner · Jan 9, 2023 · Academic Editor

Version 0.1 (original submission)

Stefan Wagner · Dec 14, 2022 · Academic Editor

Reviewer 1 · Nov 24, 2022

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Dec 9, 2022

Basic reporting

Experimental design

Validity of the findings

Additional comments

Amogh Raghunath · Dec 12, 2022

Basic reporting

Experimental design

Validity of the findings

Review History
Code4ML: a large-scale dataset of annotated Machine Learning code