Generative small language models in clinical NLP: Applications, adaptation, and evaluation
Abstract
Large language models (LLMs) such as GPT-4 and Med-PaLM have transformed clinical NLP, demonstrating remarkable capabilities in understanding and generating medical text. However, their deployment in healthcare remains constrained by high computational demands, data privacy risks and limited interpretability. In response, this survey provides a comprehensive synthesis of research on Small Language Models (SLMs) in medicine, presenting an engineering-oriented roadmap for developing efficient, trustworthy, and clinically deployable language models. We systematically reviewed lightweight architectures, parameter-efficient adaptation strategies, and data-centric training techniques that enable domain-specific optimization under limited computational resources. The survey organizes SLM applications across core NLP tasks classification, relation extraction, summarization, and question answering and further examines specialized models designed for domains such as radiology, oncology, ophthalmology, and nuclear medicine. A multidimensional evaluation framework was introduced, combining quantitative and qualitative metrics to assess factual accuracy, clinical reliability, and computational efficiency. Finally, we identify ongoing challenges and future research directions, emphasizing the role of retrieval-augmented generation (RAG), reinforcement learning from human feedback (RLHF), and emerging paradigms such as agentic AI. Collectively, this study establishes a foundation for engineering-driven SLM pipelines that balance efficiency, adaptability, and clinical trustworthiness in real-world healthcare applications. This review is intended for a broad interdisciplinary audience, including clinical NLP researchers, machine learning practitioners, healthcare AI engineers, biomedical informatics specialists, and clinicians seeking practical guidance on deploying efficient and privacy-preserving language technologies.