All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear authors, we are pleased to verify that you meet the reviewer's valuable feedback to improve your research.
Thank you for considering PeerJ Computer Science and submitting your work.
Kind regards
PCoelho
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn M.Gomez, a PeerJ Section Editor covering this Section #]
no comment
no comment
no comment
I thank the authors for the updated version of the document. The new text makes the document stronger and clearer. All my previous concerns have been addressed, and I am satisfied with the result.
Dear authors,
You are advised to critically respond to the reviewer's comments point by point when preparing a new version of the manuscript and while preparing for the rebuttal letter.
Please address all comments/suggestions provided by reviewers, considering that these should be added to the new version of the manuscript.
Kind regards,
PCoelho
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
- This manuscript is thorough, well-structured and relevant, aiming to investigate how generative AI can enhance programming education through timely feedback.
- The literature provides a good overview of the topic. The authors provide a detailed and well-supported background, effectively highlighting the potential of generative AI in educational contexts.
- As I am not a native speaker, I will not explicitly judge the language quality. However, some sentences appear overly long and might be more readable if split into shorter ones. For example, lines 83-87 could be reconsidered.
- Feedback is defined twice in the manuscript (lines 40-42 and 120-121) with slightly differing wording. I recommend that the authors consolidate these definitions for clarity.
- Providing an English translation of Figure 3 would be valuable, as it allows readers to assess the response quality and better understand the interactions facilitated by tutorB@t.
- Several minor issues require correction:
- Typo at line 18 ("AI GEN" instead of "GEN AI").
- Brackets missing at line 262.
- Typo at line 377 ("thar" should be "that").
- To my knowledge sources integrated into sentences should not be bracketed (e.g. line 40).
- The quasi-experimental design is appropriate given the context. The authors have justified their methodological choices, although it remains unclear how the tutorB@t was evaluated. As the effect of feedback through GEN AI is evaluated, the control group should not use GEN AI. As tutorB@t is an AI tool using GEN AI, it should be part of the experimental group, but the description of the experimental group is limited to the use of the tutorBot+. Clarification would be appreciated.
- Details are missing regarding the pre- and post-tests: the authors should clarify when these tests were taken, how they were designed, and if they are validated instruments. Furthermore, the authors should clarify if different tests were used for experimental and control groups, and if so, provide a clear justification for this decision. It would be beneficial if the authors explicitly clarified whether the pre- and post-tests differed for the same group, as they attribute poorer performance after the intervention to increased difficulty in course content. Clarifying this distinction would help interpret the results more accurately.
- With the exception of the above points, the methods are sufficiently described to allow replication, including clear descriptions of the software prototypes (tutorB@t and tutorBot+) and the underlying infrastructure (Judge0, GPT-4o-mini). The authors also provide the source code for tutorBot+.
- The authors transparently report that there is insufficient evidence to conclusively determine the effectiveness of GEN AI feedback on passing rates.
- The conclusions are appropriately cautious and justified by the results presented. The acknowledgment of unresolved questions, such as group homogeneity and external variables, adds credibility.
The manuscript is written in clear and professional English, with unambiguous phrasing and a well-organized narrative throughout. The Introduction provides sufficient context and clearly articulates the motivation behind the study, effectively situating the work within current developments in AI-based feedback systems for programming education. The background is thorough and well referenced, drawing upon a balanced selection of contemporary and foundational literature relevant to both computer science education and generative AI. The structure of the paper aligns well with PeerJ formatting and general discipline norms, with any deviations implemented to enhance clarity and readability. While the work is more applied than theoretical, formal constructs such as experimental design, evaluation metrics, and usability measures are clearly defined. Results are supported by appropriate statistical analyses and visual aids. As the work does not rely on formal mathematical theorems, detailed proofs are not required, and the presentation of results is appropriate for the nature of the study.
The article falls well within the Aims and Scope of the journal and is appropriate for the selected article type. It presents a rigorous investigation conducted to a high technical and ethical standard, with formal approval and informed consent procedures in place. The methods are described with sufficient detail, including the development of AI-powered tools, system architecture, hardware specifications, and access to both code and deployed platforms, allowing for potential replication. While data preprocessing is mentioned, a more comprehensive explanation of how excluded, incomplete, or noisy data were treated would enhance transparency. Evaluation methods, including the use of SUS scores, test case-based assessments, and appropriate statistical analyses (e.g., t-tests), are well presented and justified. The article draws on a wide and relevant range of sources, all of which are cited appropriately and integrated effectively into the discussion.
The experiments and evaluations are generally performed satisfactorily, with a well-structured quasi-experimental setup and appropriate use of statistical analysis to interpret the results. The paper presents a clear argument aligned with the goals stated in the Introduction, particularly the exploration of generative AI in providing feedback for programming education. However, while the tools' usability and student perceptions are well analyzed, the findings on student performance lack a conclusive interpretation, especially given the unexpected decline in post-intervention scores for the experimental group. A more nuanced discussion of potential confounding variables and their impact on the results would strengthen the paper’s validity. The Conclusion is clearly written and appropriately limited to the presented findings. It also acknowledges key limitations such as group non-homogeneity and outlines future directions including repeated experiments and expanded accessibility features. To further enhance the manuscript, a more direct reflection on why GEN AI feedback did not yield the anticipated performance gains would be valuable.
The manuscript presents a timely and relevant study exploring the integration of generative AI tools in programming education with strong practical implications for improving feedback delivery in large scale learning environments. The dual platform approach, conversational and non conversational, adds value by offering comparative insights into user experience and educational impact. The clarity of writing, inclusion of open source access, and ethical transparency are commendable. To further strengthen the contribution, the authors may consider elaborating on specific use case scenarios where these tools are most effective, as well as expanding on how the platform could be adapted or scaled across different programming curricula or institutional contexts. Additionally, minor editorial polishing would enhance the overall readability.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.