Lightweight clinical language modeling: QLoRA-based fine-tuning of LLaMA for domain-specific medical text analysis
Abstract
Large language models(LLMs) have performed remarkably well on NLP tasks, including the field of medicine and clinics. LLMs have redefined and revolutionized the way tasks like medical text and clinical text analysis are approached in NLP. General LLMs such as ChatGPT and Gemini are neither suitable nor reliable and trusted for use in the medical area. They lack efficiency and accuracy within the particular domaint. Full fine-tuning approaches are typically not viable in low-resource environment and not readily accessible for practical clinical deployment. To overcome this, we explore the use of Quantized Low-Rank Adaptation (QLoRA) for parameter-efficient LLaMA 3 fine-tuning for clinical text. QLoRA facilitates fine-tuning by adding low-rank adaptation (LoRA) layers on top of a quantized pre-trained model, which saves a lot of GPU memory without sacrificing model performance. In this paper, we utilize QLoRA to fine-tune LLaMA 3 on clinical corpora and test its performance. We compare QLoRA-enhanced models to baseline fine-tuning approaches to compare efficiency and performance trade-offs. Our results demonstrate that QLoRA offers competitive accuracy at greater than 50\% reduced memory consumption, supporting fine-tuning on consumer-grade GPUs.We also introduce a thoughtfully selected Indian clinical dataset of 100K annotated sentences, including Indian-specific drugs, diagnoses, symptoms, and prescriptions. We also extend our approach to include medical images (e.g., scans, prescriptions) as the input via multimodal integration. Our results indicate that QLoRA offers an economically feasible, scalable approach to fine-tuning LLM on clinical text analysis. With less computational overhead and preserved accuracy, the approach makes the usability of the deployment of LLaMA in actual medical environments feasible, enhancing the availability of clinical NLP applications.