Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on February 19th, 2025 and was peer-reviewed by 5 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on April 16th, 2025.
  • The first revision was submitted on May 22nd, 2025 and was reviewed by 3 reviewers and the Academic Editor.
  • The article was Accepted by the Academic Editor on June 26th, 2025.

Version 0.2 (accepted)

· Jun 26, 2025 · Academic Editor

Accept

All the reviewers suggest to accept the paper.

[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Section Editor covering this Section #]

·

Basic reporting

The Authors have responded to all the raised concerns in a proper and complete way.

Experimental design

Please see above.

Validity of the findings

please see above.

Additional comments

None

Reviewer 3 ·

Basic reporting

The manuscript is well-written and written in clear English. Similarly, I believe the introduction section provides a clear understanding of the topic and demonstrates the need for automatic classification of psychiatric records, especially for anxiety disorders. I also think the literature review is quite comprehensive, addressing the topics on which the manuscript is based. Furthermore, the manuscript's structure is appropriate for a scientific paper; it includes the appropriate sections. I am struck by the use of formal definitions and equations, even though this is not a mathematically focused study.

Experimental design

The manuscript presents original research appropriate to the journal's scope. The research question is clear and addresses an interesting topic, such as the use of classification algorithms focused on mental health issues from a technological and psychiatric perspective.
I consider the dataset to be quite large and of good quality, and it has been processed in an innovative way based on the use of AI and expert knowledge.
Furthermore, the processing methods used, as well as the sampling and data balancing methods, are appropriate for the type of study and are clearly explained. The inclusion of a repository allows for replication of the experiments.

Validity of the findings

The findings are important and based on a solid experimental process. The metrics used to determine model performance are appropriate for the type of data being processed and the classification models used.
Tuning hyperparameters proves to be a good strategy for improving model performance. However, the size of the dataset may be a limiting factor for improving performance; that is, a larger dataset could generate better results.
The conclusions are aligned with the results obtained and presented. The authors highlight the use of AI to improve the diagnosis of psychiatric patients.

Additional comments

It is a well-motivated and relevant study with real-world clinical significance, also transparent and reproducible, based on a good preprocessing and expert validation.
However, the manuscript could be improved with a slightly more extensive explanation of the combination of chatgpt for text classification and its validation by experts.

Reviewer 4 ·

Basic reporting

The authors have provided reasonable explanations or revised the manuscript in response to most of the reviewers' comments.

Experimental design

No further comment

Validity of the findings

No further comment

Version 0.1 (original submission)

· Apr 16, 2025 · Academic Editor

Major Revisions

According to reviewers' comments you should clarify some methodological and experimental aspects, and do a deeper comparison with other SOTA works.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

·

Basic reporting

Please see the attached document

Experimental design

Please see the attached document

Validity of the findings

Please see the attached document

Additional comments

Please see the attached document

Reviewer 2 ·

Basic reporting

I have carefully reviewed the manuscript titled " Classification of Psychiatry Clinical Notes by Diagnosis: A Deep Learning and Machine Learning Approach” and consider that it meets the required quality standards for publication. The article is written in clear, professional, and unambiguous English, with precise technical expression and an appropriate tone for an academic publication. The language is consistent and facilitates easy comprehension of the content.
The manuscript provides a solid introduction and adequate context, clearly framing the research within the field of study. Relevant and up-to-date references are included, demonstrating a thorough understanding of the state of the art and allowing readers to grasp the significance of the work within the discipline. The literature review is sufficient and well-founded, clearly establishing the contribution of the study.
Regarding structure, the article follows an appropriate and well-organized format, presenting each section logically and coherently. The figures and tables are of high quality, correctly labeled, and essential for understanding the results. Additionally, raw data has been shared in accordance with best open-access practices, ensuring transparency and reproducibility.
The study is self-contained and presents results relevant to the stated hypothesis. The research is developed comprehensively, without unnecessary fragmentation, and provides well-supported conclusions. Key terms and concepts are clearly defined, and formal results are presented with precision, ensuring theoretical and methodological rigor.
In conclusion, I find the manuscript to be solid, well-structured, and in compliance with the requirements for publication. I do not identify any significant observations that would require substantial modifications.

Experimental design

This is an original and primary research study that falls within the scope and objectives of the journal, making a significant contribution to its field of study. The article presents a clearly defined, relevant, and meaningful research question, precisely identifying the knowledge gap it aims to address and detailing how its contribution effectively fills this gap.
The study has been conducted with a high level of technical and methodological rigor, ensuring solid and reliable results. It is evident that the research has been carried out in accordance with the highest ethical standards, complying with all current regulations in the discipline. Furthermore, the methodology has been described in sufficient detail to allow for the study’s replicability, providing the necessary information for other researchers to reproduce the experiments and validate the findings.
In conclusion, this article represents a valuable contribution to the scientific community, combining well-founded research with impeccable technical execution. I find no aspects that require significant modifications and commend the authors for their excellent work.

Validity of the findings

I consider that it meets the required quality standards for publication. The study presents well-founded research with a clear justification that supports its relevance within the existing literature. In the case of a replication study, the authors have adequately described the rationale and the benefit it brings to the scientific community, demonstrating its value in terms of validation, performance comparison, and methodological rigor.
All underlying data have been provided and meet the standards of robustness, statistical validity, and proper control. The transparency in data availability strengthens the study’s reproducibility, allowing other researchers to independently validate the findings. The statistical analysis has been handled carefully, ensuring that the derived conclusions are solid and supported by the presented evidence.
The conclusions are clearly stated and well linked to the original research question, without overinterpretations or unwarranted extrapolations. The study remains within the boundaries of its findings, providing a precise analysis aligned with the obtained results.
In conclusion, this article represents a well-structured and rigorous contribution, with solid data and a coherent analysis. I do not identify any aspects that require significant modifications and commend the authors for their excellent work.

Reviewer 3 ·

Basic reporting

I believe the authors use clear and concise language when explaining each phase of their research. They guide the reader in a friendly manner throughout the document. The bibliography is accurate, as is the structure of the article, which adheres to academic standards. Due to the nature of the data, it is understandable that they cannot be shared; however, the authors provide a clear description of the data. The results and methodology are clear and understandable for an expert in the area of ​​knowledge to which the document belongs.

Experimental design

Regarding the experimental design, the use of LLM tools for the text classification process caught my attention. Despite the fact that experts were employed for the subsequent phase, it is innovative, and it is possible that this strategy helped to better classify the texts before training the models.
On the other hand, oversampling techniques are quite beneficial when there is insufficient data.

Validity of the findings

The findings are of great scientific value thanks to the use of NLP algorithms. In practice, algorithms like these could be used to create online tools (probably bots) that allow experts to obtain structured information and classify patient data, thereby somewhat addressing the shortage of experts and their workloads.

Additional comments

It's clear that mental health today has taken on a newfound importance that was minimized in previous years.

In that sense, I believe that leveraging technological advances such as AI to create tools that help address the shortage of experts in this area is of great value.

The document is generally good. It offers a very good strategy based on Artificial Intelligence for both training models and using them in text classification processes in the area of ​​mental health, which could reduce the workload of experts and provide care to more patients.

Reviewer 4 ·

Basic reporting

#1. The first part of the introduction lacks sufficient explanation of the research background and purpose. In particular, the dichotomous explanation of psychology and psychiatry would likely be difficult to gain consensus from mental health professionals. While access to mental health professionals may be challenging depending on the country and society, supporting diagnostic classification from clinical notes as in this study does not significantly improve this accessibility by reducing workload. This is because the most time-consuming aspect is directly interviewing patients, collecting information, and documenting these records. Since making psychiatric diagnosis in a clinician's mind occurs during this process, estimating diagnoses based on medical records after they have been documented has limited clinical significance. This merely serves as a double-checking of the clinician's judgment, but since most clinicians write clinical notes in a way that supports their presumed diagnosis, the retrospective verification holds little value.

#2. Figure 1 is not informative. This plot may be transformed into a histogram.

#3. Figure 3 does not need to be presented as a graph and should be converted to a table instead, with additional demographic information displayed according to diagnosis.

#4. Figures 4 and 5 duplicate the results already presented in the tables.

#5. As a minor point, the paper's reference format does not follow PeerJ's guidelines.

Experimental design

This study compared traditional machine learning approaches (Random Forest, SVM, KNN, Decision Tree, XGBoost) and deep learning models (DistilBERT, SciBERT) for classifying clinical notes into Anxiety and Adjustment Disorder categories. While oversampling techniques had minimal impact overall (except SMOTE with BERT models), hyperparameter optimization significantly improved model accuracy across all approaches. Both Decision Tree and XGBoost achieved 96% accuracy among machine learning approaches, matching the performance of DistilBERT and SciBERT in the deep learning category.

#1. The authors collected 12,921 clinical notes from hospital psychiatric electronic health records (EHR) for their research. Clinical notes are recorded during each visit for outpatients and on a daily basis for inpatients. The authors did not specify whether these records were from outpatients or inpatients. For outpatients, extensive information is typically documented during the first visit, with follow-up notes documenting progress. Similarly, for inpatients, detailed information is recorded in admission notes, while subsequent progress notes document ongoing developments. There is insufficient information about how many patients these 12,921 clinical notes represent (i.e., whether they include multiple visit records from the same patients), and what proportion consists of first visit/admission notes versus progress notes.

#2. Clinical notes following the SOAP format include the diagnosis in the Assessment section, which creates a problem where the training data may contain the diagnoses that the model is trying to predict. The authors mentioned that they extracted diagnoses from clinical notes using LLM during the preprocessing stage, but there is no mention of whether these diagnoses were subsequently excluded from the clinical notes used to train the prediction models. This needs to be clarified.

#3. The anxiety disorder diagnosis being predicted in this study is heterogeneous. In the DSM-5 diagnostic system, it could include specific phobia, panic disorder, generalized anxiety disorder, social anxiety disorder, anxiety disorder NOS, and others. I am wondering whether the authors considered these specific diagnostic subcategories as anxiety disorder in their analysis.

#4. The authors did not perform K-fold cross validation in their study. This seems necessary at least during the hyperparameter tuning process, so what was the reason for not implementing this approach?

Validity of the findings

It looks Ok.

Reviewer 5 ·

Basic reporting

There are many similar top journal papers within the past year, so please find them all, analyze them, and then mention the differences.

Experimental design

There are many similar top journal papers from within the past year, so please find them all, analyze them, and then reinforce the results of the performance evaluation comparison.

Validity of the findings

There are many similar top journal papers from within the past year, so please find them all, analyze them, and then reinforce the results of the performance evaluation comparison.

Additional comments

This research has contributed to the field of clinical text classification by examining the effectiveness
of different machine learning models in distinguishing between patients diagnosed with Adjustment
Disorder and Anxiety Disorder based on clinical notes. Several important findings emerged from this
study, highlighting the strengths and limitations of the models employed, as well as the impact of applying
oversampling techniques to address class imbalance in the datase. However There are many similar top journal papers from within the past year, so please find them all, analyze them, and then reinforce the results of the performance evaluation comparison.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.