HomeLREC 2026WorkshopsCLINICALNLPlrec2026-ws-clinicalnlp-36
Back to CLINICALNLP 2026
LREC 2026workshop

MedNormJ: A Benchmark Dataset for Medical Concept Normalization in Japanese Clinical Documents

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

DOI:10.63317/3tc32wmofkbm

Abstract

Medical concept normalization in clinical text is a fundamental technology for the secondary use of clinical data. However, constructing annotated resources for this task is challenging because annotation is both expertise-intensive and methodologically complex. As a result, a standard evaluation dataset for Japanese has yet to be established. In this study, we introduce a Japanese dataset for medical concept normalization, MedNormJ, which will be publicly available. The dataset consists of 397 pairs of medical expressions and their corresponding normalized disease names, manually curated from 96 medical documents, including case reports and radiology reports. Furthermore, we conduct comparative experiments using existing normalization approaches to benchmark their performance on this dataset in terms of both accuracy and computational efficiency. Through these experiments, we clarify the present performance level and identify remaining challenges specific to Japanese medical concept normalization.

Details

Paper ID
lrec2026-ws-clinicalnlp-36
Pages
pp. 324-335
BibKey
tashiro-etal-2026-mednormj
Editors
Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • YT

    Yuki Tashiro

  • SS

    Seiji Shimizu

  • TN

    Tomohiro Nishiyama

  • SW

    Shoko Wakamiya

  • EA

    Eiji Aramaki

Links