HomeLREC 2026WorkshopsCLINICALNLPlrec2026-ws-clinicalnlp-25
Back to CLINICALNLP 2026
LREC 2026workshop

LTRC-Medicom at MEDIQA-SYNUR 2026: Schema-Guided Clinical Information Extraction with Hybrid Clustering-SFT-Verification

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

DOI:10.63317/4oou5ss2efr6

Abstract

Extracting structured clinical data from unstructured patient transcripts is challenging due to large target schemas and inherent linguistic ambiguity. We address the extraction of 193 heterogeneous clinical attributes from nursing notes and clinician–patient dialogues, and demonstrate that zero-shot large language models (LLMs) are ineffective in this setting, achieving an F1 score below 0.15 due to context window saturation and hallucination. We propose a four-stage framework that combines semantic schema clustering, role-based chain-of-thought prompting, supervised fine-tuning of Llama-3.1-8B, and transcript-verified post-processing. Our approach achieves an F1 score of 0.66, representing a 4.4x improvement over the baseline, by balancing high recall from generative models with high precision from verification. These results highlight the effectiveness of hybrid pipelines for high-stakes clinical information extraction.

Details

Paper ID
lrec2026-ws-clinicalnlp-25
Pages
pp. 228-234
BibKey
deepak-etal-2026-ltrc
Editors
Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PD

    Pasumarthy Deepak

  • SM

    Sushvin Marimuthu

  • PK

    Parameswari Krishnamurthy

Links