Generating case notes from digital transcripts using text mining
- Published
- Accepted
- Subject Areas
- Bioinformatics, Data Mining and Machine Learning, Natural Language and Speech
- Keywords
- Machine Learning, Text Mining, Psychiatry, Case notes, Mental Health, EHR, Natural Language Processing
- Copyright
- © 2019 Kazi et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2019. Generating case notes from digital transcripts using text mining. PeerJ Preprints 7:e27497v1 https://doi.org/10.7287/peerj.preprints.27497v1
Abstract
Current health care systems require clinicians to spend a substantial amount of time to digitally document their interactions with their patients through the use of electronic health records (EHRs), limiting the time spent on face-to-face patient care. Moreover, the use of EHRs is known to be highly inefficient due to additional time it takes for completion, which also leads to clinician burnout. In this project, we explore the feasibility of developing an automated case notes system for psychiatrists using text mining techniques that will listen to doctor-patient conversations, generate digital transcripts using speech-to-text conversion, classify information from the transcripts by identifying important keywords, and automatically generate structured case notes.
In our preliminary work, we develop a human powered doctor-patient transcript annotator and obtain a gold standard dataset through National Alliance of Mental Illness (NAMI) Montana. We model the task of classifying parts of conversations in to six broad categories such as medical and family history as a supervised classification problem and apply several popular machine learning algorithms. According to our preliminary experimental results obtained through 5-fold cross validation, Support Vector Machines are able to classify an unseen transcript with an average AUROC (area under the receiver operating characteristic curve) score of 89%. Using part-of-speech (POS) tagging, grammatical rules of English language and verb conjugation, we generate formal representation of each sample. For each class, we form a paragraph using the formal representations of its samples. Using these paragraphs, we generate a case note.
Author Comment
In this project, we explore the feasibility of developing an automated case notes system for psychiatrists using machine learning (ML) and natural language processing (NLP) techniques.