Smart_solutions at MEDIQA-SYNUR 2026: A Multi-Stage LLM Pipeline for Nursing Observation Extraction
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Abstract
Extracting clinical observations from nursing dictations addresses an important problem of addressing burden in clinical documentation. In this work, we describe our approach submitted to MEDIQA-SYNUR 2026, which achieved third place among participating teams with a balanced precision and recall of 0.80 on the unseen test set. Our approach, instead of finetuning LLMs, is to adopt a multi-stage pipeline of agents: Observation Agent, Ontology Matching Agent, Relevance Scoring Agent, Evidence Assignment Agent, and Formatting Agent. First, the Observation Agent extracts clinical observations and corresponding evidence from the nurse transcript. These observations are then processed by the Ontology Matching Agent, which maps them to a restricted set of candidate ontology fields via TF-IDF–based retrieval, and subsequently evaluated by the Relevance Scoring Agent, which assigns continuous support scores (1–5) to each candidate field. Finally, field value assignments are performed by the Evidence-Based Agent, which extracts values strictly from nurse transcripts and clinical observations (Observation Agent outputs) to populate each ontology field. These outputs are then formatted by the Formatting Agent to ensure correct submission structure with the necessary metadata. Our agentic system results suggest that combination of agents with prompt engineering can narrow the gap between general and specialized clinical NLP models, making it an immediately deployable alternative to traditional fine-tuning.