Automatic Generation of Discharge Summaries Using Large Language Models: A Systematic Literature Review
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Abstract
Discharge summaries are critical documents for continuity of care, yet their manual creation imposes significant burdens on clinical staff. This systematic literature review examines current approaches to automatic generation of discharge summaries using Natural Language Processing (NLP) and Large Language Models (LLMs). Following the Kitchenham guidelines for systematic reviews in software engineering, we searched Scopus and PubMed databases for studies published between 2023 and 2026, identifying 9 primary studies from an initial pool of 102 papers. Our analysis reveals that GPT-4 and its variants dominate current research (appearing in 6 of 9 studies), while open-source alternatives like LLaMA show promise for privacy-preserving deployments. Evaluation primarily relies on automatic metrics (ROUGE, BLEU) combined with human expert assessment. Key challenges include hallucination rates ranging from 33% to 64%, information omission, integration with Electronic Health Record (EHR) systems, and context window limitations. Studies addressing factuality employ human-in-the-loop validation, prompt engineering techniques, and knowledge graph-based correction mechanisms. Despite these challenges, recent implementations demonstrate clinical feasibility, with one study achieving a 94.35% System Usability Score. This review provides a comprehensive synthesis of the state-of-the-art and identifies opportunities for future research in this rapidly evolving field.