Generating case notes from digital transcripts using text mining

Gianforte School of Computing, Montana State University, Bozeman, Montana, United States
DOI
10.7287/peerj.preprints.27497v1
Subject Areas
Bioinformatics, Data Mining and Machine Learning, Natural Language and Speech
Keywords
Machine Learning, Text Mining, Psychiatry, Case notes, Mental Health, EHR, Natural Language Processing
Copyright
© 2019 Kazi et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Kazi N, Kahanda I. 2019. Generating case notes from digital transcripts using text mining. PeerJ Preprints 7:e27497v1

Abstract

Current health care systems require clinicians to spend a substantial amount of time to digitally document their interactions with their patients through the use of electronic health records (EHRs), limiting the time spent on face-to-face patient care. Moreover, the use of EHRs is known to be highly inefficient due to additional time it takes for completion, which also leads to clinician burnout. In this project, we explore the feasibility of developing an automated case notes system for psychiatrists using text mining techniques that will listen to doctor-patient conversations, generate digital transcripts using speech-to-text conversion, classify information from the transcripts by identifying important keywords, and automatically generate structured case notes.

In our preliminary work, we develop a human powered doctor-patient transcript annotator and obtain a gold standard dataset through National Alliance of Mental Illness (NAMI) Montana. We model the task of classifying parts of conversations in to six broad categories such as medical and family history as a supervised classification problem and apply several popular machine learning algorithms. According to our preliminary experimental results obtained through 5-fold cross validation, Support Vector Machines are able to classify an unseen transcript with an average AUROC (area under the receiver operating characteristic curve) score of 89%. Using part-of-speech (POS) tagging, grammatical rules of English language and verb conjugation, we generate formal representation of each sample. For each class, we form a paragraph using the formal representations of its samples. Using these paragraphs, we generate a case note.

Author Comment

In this project, we explore the feasibility of developing an automated case notes system for psychiatrists using machine learning (ML) and natural language processing (NLP) techniques.