Automatically generating psychiatric case notes from digital transcripts of doctor-patient conversations using text mining
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Data Mining and Machine Learning, Natural Language and Speech
- Keywords
- Machine Learning, Text Mining, Psychiatry, Case notes, Mental Health, EHR, Natural Language Processing
- Copyright
- © 2019 Kazi et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2019. Automatically generating psychiatric case notes from digital transcripts of doctor-patient conversations using text mining. PeerJ Preprints 7:e27497v2 https://doi.org/10.7287/peerj.preprints.27497v2
Abstract
Current health care systems require clinicians to spend a substantial amount of time to digitally document their interactions with their patients through the use of electronic health records (EHRs), limiting the time spent on face-to-face patient care. Moreover, the use of EHRs is known to be highly inefficient due to additional time it takes for completion, which also leads to clinician burnout. In this project, we explore the feasibility of developing an automated case notes system for psychiatrists using text mining techniques that will listen to doctor-patient conversations, generate digital transcripts using speech-to-text conversion, classify information from the transcripts into relevant categories, and automatically generate structured case notes.
In our preliminary work, we develop a human-powered doctor-patient conversation transcript annotator and obtain a gold standard dataset through the National Alliance of Mental Illness (NAMI) Montana. We model the task of classifying parts of conversations into six broad categories such as medical and family history as a supervised classification problem and apply several popular machine learning algorithms. According to our preliminary experimental results obtained through 5-fold cross-validation, Support Vector Machines are able to classify an unseen transcript with an average AUROC (area under the receiver operating characteristic curve) score of 89%. Finally, we use part-of-speech (POS) tagging, grammatical rules of English language and verb conjugation, we generate written versions of the pieces of text belonging to different categories. These formal text are aggregated in to filling different sections of the EHR forms.
Author Comment
Changes to the title and the abstract was made.