HomeLREC 2026WorkshopsCLINICALNLPlrec2026-ws-clinicalnlp-23
Back to CLINICALNLP 2026
LREC 2026workshop

Lakefront AI Ramblers at MEDIQA-SYNUR 2026: Hybrid Retrieval and LLM Verification for Open-Source Schema-Guided Clinical Information Extraction

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

DOI:10.63317/2jfca3zi8p4o

Abstract

Schema-constrained clinical information extraction requires identifying text-supported observations and outputting exact schema identifiers and values. In the MEDIQA-SYNUR 2026 shared task, synthetic nursing dictations were mapped to structured JSON outputs aligned with a 193-concept clinical schema under strict exact-match evaluation. We extended the baseline pipeline, which consists of transcript segmentation, schema retrieval, and LLM-based extraction, with hybrid schema retrieval, supervised fine-tuning (SFT) of open-source LLMs, and LLM-based verification. Our hybrid retrieval approach combined dense embeddings with sparse BM25 representations using a convex combination strategy, improving schema coverage to 0.994 recall@60 on the development set. We evaluated GPT-4o, GPT-4o-mini, Llama-3-8B-Instruct, and Llama-3.3-70B-Instruct, applying LoRA-based SFT to open-source models. On the official test set, our best submitted configuration (Llama-3.3-70B-Instruct-SFT with union voting and GPT-4o-mini verification) achieved 0.711 F1. Post-competition experiments showed that Llama-3-8B-Instruct-SFT (train + dev) reached 0.723 F1 under the same post-processing pipeline. For reference, GPT-4o achieved 0.791 F1 and did not benefit from post-processing. Performance differences across development and test splits further highlight the sensitivity of post-processing strategies to variation across split distribution. Overall, integrating high-recall retrieval, SFT, and LLM verification substantially narrows the performance gap between open- and closed-source models for schema guided clinical extraction.

Details

Paper ID
lrec2026-ws-clinicalnlp-23
Pages
pp. 212-221
BibKey
saban-etal-2026-lakefront
Editors
Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MS

    Michael T. Saban

  • AY

    Arsalan Yaghoubi

  • BE

    Behnaz Eslami

  • ST

    Samie Tootooni

  • DD

    Dmitriy Dligach

Links