All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Dear authors, we are pleased to verify that you meet the reviewer's valuable feedback to improve your research.
Thank you for considering PeerJ Computer Science and submitting your work.
Kind regards
PCoelho
[# PeerJ Staff Note - this decision was reviewed and approved by Massimiliano Fasi, a PeerJ Section Editor covering this Section #]
Dear authors,
Thanks a lot for your efforts to improve the manuscript.
Nevertheless, some concerns are still remaining that need to be addressed.
Like before, you are advised to critically respond to the remaining comments point by point when preparing a new version of the manuscript and while preparing for the rebuttal letter.
Kind regards,
PCoelho
The paper reports on a computational approach to labeling discharge diagnosis codes from electronic health record data that was designed and evaluated using retrospective data from a hospital in Colombia. The paper's introduction and motivation have much improved since the last iteration, and overall, the justification for performing this research is considerably clearer and more convincing. The authors have included more analyses, which help to answer the research questions related to public health surveillance, and these make the paper a more complete, self-contained contribution. Overall, I believe that this paper is almost ready for publication and needs just a few minor edits before acceptance.
The design of the system is now well-justified and explained, and the evaluations performed are appropriate. The co-occurrence analysis is helpful and brings greater depth to the results.
Overall, the findings are clearer, and validity is higher due to the claims of the paper being scoped appropriately. One concern about the new results is Table 10, the Direct GPT 3.5-Turbo analysis - why are results for only two conditions provided? If this is due to cost limitations or sample sizes, this should be mentioned around lines 464-465; otherwise, it may seem as though the results here are cherry-picked to show only the cases with low GPT performance.
The language in the paper is improved, and most typos have been addressed. I still noticed one possible AI artifact that should be removed (line 401 in the unmarked revision). Also, the terms "inducer" and "inductor" are both used in the paper, and it would be good to be consistent. I would suggest using "inducer" throughout, as "inductor" is typically an electrical engineering term.
-
-
-
The overall clarity of the article is good, and the structure is also well-organized.
Literature provides sufficient background study, and all the references are appropriate.
The technical aspects of the proposed system are good and described with sufficient methods and relevant data.
Conclusions are well stated.
**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.
**Language Note:** When you prepare your next revision, please either (i) have a colleague who is proficient in English and familiar with the subject matter review your manuscript, or (ii) contact a professional editing service to review your manuscript. PeerJ can provide language editing services - you can contact us at [email protected] for pricing (be sure to provide your manuscript number and title). – PeerJ Staff
This paper presents a technical description and evaluation of a system designed to predict clinical diagnosis codes from a variety of structured databases and unstructured text reports. Overall, the narration and language are mostly clear. Building AI-powered systems to operate on patient data is always challenging, and the authors have done a commendable job developing a system that seems useful in their hospital context.
However, as it currently stands, it's not exactly clear what problem or decision the system is intended to support, and the motivation for building the system is unclear. In clinical practice, at least in the United States, discharge summaries are always associated with ICD codes to help calculate the cost of care for billing purposes. Why then is it important to *predict* these diagnosis codes if they are already regularly annotated by clinicians?
Also, it appears that the motivation for predicting discharge diagnoses is to facilitate analyses of population health, morbidity, and mortality. However, this application would be more appropriately termed a public health surveillance tool or a data mining tool rather than a clinical decision support system, so the use of the term CDSS throughout the paper is confusing.
The paper does not engage sufficiently with existing literature around using discharge diagnosis codes for prediction, a topic that has been extensively explored. Here are two example papers that the authors could use as a starting point for a more extensive literature review:
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N. (2014). Diagnosis code assignment: models and evaluation metrics. *Journal of the American Medical Informatics Association*, 21(2), pp. 231–237. https://doi.org/10.1136/amiajnl-2013-002159
Masud, J. H. B., Kuo, C.-C., Yeh, C.-Y., Yang, H.-C., & Lin, M.-C. (2023). Applying Deep Learning Model to Predict Diagnosis Code of Medical Records. *Diagnostics*, *13*(13), 2297. https://doi.org/10.3390/diagnostics13132297
The authors may review this related work and make a stronger case in their introduction about why the specific system they built addresses a still-existent challenge.
**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.
The experimental design here seems appropriate for the journal. The system developed appears well thought-out and flexible, and I appreciate that the authors provided code and instructions along with the paper.
The methodology could be presented in a clearer way to demonstrate the rigor of the development process. One issue is that it is unclear how much data was associated with ground-truth diagnosis codes at the start, and how much data was labeled by experts (and how many experts were engaged). At least based on my read of the paper, it seems that discharge codes already existing in the hospital databases were not used at any point, which is surprising and should be explained more.
Additionally, the system is never compared to any baseline system, so it is difficult to assess whether the choices made in the system’s design were actually effective. At the very minimum, it would be useful to compare the full pipeline to the results that would be obtained by using each component of the system individually (e.g., the similarity lookup alone, GPT 3.5 alone).
I would have liked to see a more in-depth analysis of the extracted diagnosis codes with respect to population morbidity and mortality, given that the paper is initially motivated by these concepts. Currently, there is very little description of results (lines 355-365) from the experiments in section 1.7, and figures 5-7 don’t convey much useful information. It would be more interesting to see, for example, the relationship between these diagnosis codes and mortality rates or how often certain diagnosis codes co-occur. Moreover, it would be useful to compare the distribution of diagnosis codes assigned by the algorithm (presumably in an unlabeled dataset) to the distribution of diagnosis codes in the labeled dataset, to see whether the algorithm is improving coverage of codes that are underrepresented in labeled data. This would also help demonstrate the usefulness of the system in population health monitoring.
The findings that are presented are mostly rigorous, but could also be improved. Specifically, it would be useful to indicate the uncertainty of some of the metrics, such as the average scores in Tables 3 and 6.
Also, the results described in lines 337-345 and Table 5 seem arbitrary. It's nice that the authors developed a specific way to improve the results in one category, but it’s not justified why that method makes sense, and it is not described in the Methods. I would suggest removing that result and instead analyzing why the system tends to err on the Epidural Hematoma class, or alternatively applying the proposed sentence similarity module improvement to all classes and presenting those results.
The authors should do a thorough proofreading check and also ensure there are no artifacts from using LLMs to assist with the writing. For example, lines 322-323 should be corrected, and many of the in-text citations need to be parenthesized correctly.
Need more references.
-
Write one paragraph about the novelty and limitations of your work.
Some of the articles that you may read for the literature review:
A. A clinical decision support system using rough set theory and machine learning for disease prediction.
B. An intelligent recommender system using machine learning, association rules, and rough sets for disease prediction from an incomplete symptom set
C. Clinical decision support system based on RST with machine learning for medical data classification
D. Churn prediction of a clinical decision support recommender system
**PeerJ Staff Note:** The PeerJ's policy is that any additional references suggested during peer review should only be included if the authors find them relevant and useful.
Incorporated all the suggestions point by point
1. How does the integration of both structured and unstructured data contribute to the overall accuracy and effectiveness of the proposed Clinical Decision Support System (CDSS), and what are the specific challenges addressed in processing each type of data within the healthcare context?
2. Include all the related work: An analytical study on machine learning techniques, Introduction to artificial intelligence and current trends, A three-stage novel framework for efficient and automatic glaucoma classification from retinal fundus images
3. Describe the Natural Language Processing (NLP) techniques used in the unstructured data processing pipeline of the CDSS. How do the BERT model for negation and uncertainty detection and the Sentence Transformer model for semantic similarity contribute to accurate diagnosis generation?
4. Explain the architecture and implementation of the CDSS, focusing on the roles of the relational database, API framework, and machine learning models. How does this architecture ensure scalability, integration with existing healthcare systems, and real-time diagnosis support?
5. What is the nature and design of the quantitative scoring metric used to evaluate the CDSS’s performance? How does this metric reflect the system's diagnostic accuracy, and what evidence is presented to demonstrate the CDSS's reliability and validity?
6. In what ways does the proposed CDSS avoid disruption to existing healthcare workflows and systems? Discuss the strategies used to ensure ease of adoption and integration in a clinical setting, particularly regarding data interoperability and system compatibility.
7. Critically evaluate the preliminary morbidity statistics generated by the CDSS. What insights do these statistics provide about the system’s diagnostic capabilities, and how might expanding the dataset and refining diagnostic patterns improve future outcomes?
-
-
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.