Compressed Representations of Patient Records: A Comparative Study of Template-Based and LLM-Based Methods for Clinical Data Summarization and Visualization
Proceedings of the Third Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC 2026
Abstract
Electronic Health Records (EHRs) contain comprehensive patient information that is often voluminous and challenging to review efficiently. This paper presents a systematic evaluation of multiple methods for compressing patient records into standardized, comparable formats. Four compression approaches are implemented and compared: two template-based methods (structured extraction, extractive key-phrase) and two LLM-based methods (LLM, and hybrid LLM with 8 different models). Using a synthetic cohort of 75 patient records generated with realistic clinical patterns, each method is evaluated on information preservation (diagnosis, medication, allergy, lab value recall, and vital accuracy), compression efficiency, and output quality. Across methods, diagnosis recall ranged from 0.637 to 1.000, with medication and allergy recall consistently exceeding 0.880. In the test setup, the template‑based approach yielded the highest compression ratio (7.6×), while the hybrid methods provided the most balanced trade‑off between compression and clinical utility. These results suggest that combining structured extraction with LLM‑generated summaries can be an effective strategy for scenarios requiring both compact representations and contextual clinical information.