All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thank you for your contribution to PeerJ Computer Science and for addressing the reviewers' final minor suggestions. We are satisfied with the revised version of your manuscript, which is now ready for acceptance. Congratulations!
I only suggest correcting the capitalization of "Pytorch" in the title: the proper form is PyTorch (with a capital T).
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
-
-
-
Unfortunately, my comments in the previous submission were not considered in the paper. For example, the figure and table are still problems as placeholders. And the request for additional baselines is not added. While the authors argue that they add comments about future work on this problem, it seems a little bit insufficient.
I still think that using only BERT is outdated. At least, the authors can use various BERT models as I commented, since it's 2025. While the paper says that its scope is BERT, in that case, the paper's contribution can be weakened.
The validity of the findings might be strengthened by using more recent pre-trained encoder models, as I suggested.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff
Most of the concerns from the previous phase are resolved during the revision period, such as too much focus on PyTorch and some vague expressions. However, the figures and tables are still not visualized in the main section, which is separated as placeholders. I don't know the exact reason, maybe the problem of the system itself. But it should be revised before the next submission.
The experimental design is significantly improved after the revision by adding many baselines. But I still think that using only BERT is somewhat outdated. At least, the authors can use various BERT models.
The validity of the findings is improved after the revision. It can be strengthened by using more recent pre-trained encoder models.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
The manuscript addresses a timely topic at the intersection of educational NLP and unsupervised learning.
Several aspects to improve:
Remove complex sentences and vague phrasing, particularly in the abstract. The introduction and discussion sections would benefit from clearer, more concise editing to improve readability.
Clarify the novelty and motivation of the proposed method relative to existing literature.
Incorporate standard clustering evaluation metrics in the results.
Expand the literature review to reflect recent methodological developments in both NLP and clustering.
While the overall methodology is sound, several areas require further elaboration to ensure the rigor and reproducibility.
The justification for why a clustering approach is preferable to classification-based methods should be discussed in greater depth.
Provide more details regarding data preprocessing, feature extraction, and cluster determination:
>specify the corpus size, data sources, preprocessing steps and if any bias mitigation done?
> If the dataset is publicly available, a citation or link should be provided. If proprietary, a brief summary of data collection ethics and anonymization procedures is necessary.
Provide more details about Embedding Generation with BERT:
> which variant of BERT was used?
>Did the authors use the [CLS] token embedding, mean pooling, or another method to generate document representations?
>Was any fine-tuning performed on the BERT model, or were pre-trained embeddings used directly?
Provide more details about Clustering Algorithm Selection:
>The paper does not adequately justify the choice of clustering method. If K-Means was used, how was k determined?
>Any experiment with alternative clustering approaches such as HDBSCAN, Agglomerative Clustering, or Gaussian Mixture Models is done?
>If K-Means was chosen, was an elbow method or silhouette analysis used to optimize k?
While the general framework of the study is reasonable, the methodology lacks sufficient quantitative validation. The authors should include:
Ablation Study: How do different feature representations affect clustering quality?
Baseline Comparisons: A simple TF-IDF + K-Means baseline would help contextualize the effectiveness of BERT embeddings.
The paper presents a clustering framework but does not provide sufficient insight into how these clusters can be interpreted in an educational context. Some important questions remain unanswered , to be addressed are:
What specific argumentation types emerged from the clustering process?
Do the clusters align with any established argumentation models ?
How could this clustering approach be applied in real-world educational settings?
Discussion of potential limitations and sources of error is required:
Did the dataset contain arguments from multiple disciplines or proficiency levels? If so, did this affect cluster coherence?
Were any checks performed to ensure that argumentation types were not unfairly grouped due to linguistic bias?
Would this approach generalize to larger datasets?
The manuscript has potential but requires major revisions before it can be considered for publication. The authors must provide stronger methodological transparency, quantitative evaluation, comparative analysis, and clearer discussion of interpretability and limitations.
Summary: This research article compares two models for analyzing students' argumentative essays. The paper shows that both models offer valuable contributions; PERSUADE excels at in-depth analysis of argument structure within a text, while the MDC model is effective at organizing information across multiple documents.
Writing:
- While generally understandable, some unambiguous parts, such as the below, should be revised.
Lines 24-28: potential -> strength, involving the depth is vague
Lines 65-67: lack the incorporation of contextual features is passive and wordy
Lines 180-183: classify arguments into specific categories or types -> classify arguments into specific types
- The background sections dedicated to explaining fundamental and widely known concepts like PyTorch are unnecessary for a computer science journal's target audience and should be removed or drastically condensed.
- The introduction sets the stage but doesn't sufficiently justify the specific experimental approach taken later in the paper.
- Also, for the tables and figures, why does it show as a placeholder, although they are shown in the later pages.
- For Figure 4, it must be organized as tables and conduct round operation for all of the results.
- The comparison presented is insufficient. The primary comparison is between the proposed MDC model and PERSUADE. PERSUADE, while related to the dataset's origin, is fundamentally a framework/corpus focused on discourse element identification within essays, not multi-document clustering.
- The study needs to compare the MDC model against established state-of-the-art document clustering algorithms, particularly those designed for or evaluated on similar argumentative/essay data.
- Also, using a BERT-based model seems to be an outdated approach regarding the recently proposed various large language models in the natural language processing field. Although the authors did not use LLMs, the authors should have used other encoder only models such as DEBERTa or RoBERTa series.
- Claims from the paper that the MDC model provides a "robust solution" or that the models make "significant contributions" are currently overstated. Significance can only be claimed after comparison against appropriate, state-of-the-art baselines using relevant evaluation metrics for the specific task
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.