Context-Aware SNOMED CT Entity Linking for Clinical Text
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Abstract
Mapping free-text mentions in clinical notes to standardized terminologies such as SNOMED CT is essential for large-scale secondary use of electronic health records, but remains challenging due to linguistic variability, under-specified annotation guidelines, term ambiguity, and ontology scale. This work presents a two-stage entity linking pipeline that combines span detection with context-aware concept linking and evaluates it on the SNOMED CT Entity Linking Challenge dataset. Our work builds upon the SNOMED CT entity linking challenge , resulting in a fully open-source system. To our knowledge, this is the first end-to-end open-source system for this task. For span detection, we compare multiple neural architectures together with dictionary-based matching. For concept linking, we adopt a context-aware bi-encoder, and construct a multi-source knowledge base enriched with context derived from the SNOMED CT ontology. Finally, we implement an agentic re-ranker and test the effectiveness of LLM-backed re-ranking with access to annotation guidelines. In contrast to findings from the original shared task submissions, we show that context is important for optimal performance, and that agentic re-ranking with a state-of-the-art LLM only marginally improves overall performance, suggesting that the current benchmark may be approaching its practical ceiling. This work provides the first fully open-source, reproducible system for SNOMED CT entity linking, offering a foundation for future research and practical deployment.