AnotherOne at MEDIQA-SYNUR 2026: Detect, Extract, Normalize - Knowledge-Grounded LLM Pipeline for Clinical Observation Extraction
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Abstract
We present a system for the MEDIQA-SYNUR 2026 shared task on extracting structured clinical observations from nurse dictation transcripts. The transcripts contain spoken-style clinical language with disfluencies, filler words, and hesitations. Our approach is a four-stage LLM inference pipeline preceded by an offline knowledge enhancement step: (1) knowledge-enhanced concept detection using medical domain clustering, (2) evidence-grounded value extraction, (3) schema-constrained value normalization, and (4) deterministic post-processing with fuzzy matching and unit pairing. In the offline step, we use the task ontology and training examples to generate per-concept clinical definitions and extraction rules, and group the 193 concepts into 19 non-exclusive medical domain clusters. These are injected into all downstream prompts as domain priors. All LLM stages use gpt-oss-120b with structured JSON output and chain-of-thought reasoning. The task requires exact matching on concept ID and value pairs across a 193-concept ontology, making precision particularly challenging. We iteratively refine concept definitions and prompt guidelines based on error analysis of the training data. Our system achieves an F1 score of 0.806 on the test set.