HomeLREC 2026WorkshopsCLINICALNLPlrec2026-ws-clinicalnlp-24
Back to CLINICALNLP 2026
LREC 2026workshop

JMedWiC: A Japanese Word-in-Context Dataset in the Medical Domain

Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026

DOI:10.63317/2wqoixeze6fo

Abstract

We release JMedWiC, a Japanese dataset for Word-in-Context (WiC) tasks specifically tailored to the medical domain. To address the challenge of word sense disambiguation, where the meaning of a word varies depending on its context, previous research has developed WiC datasets to evaluate word sense identity by determining whether a target word shares the same sense across two given contexts. In the medical domain, the misinterpretation of word senses can hinder the accurate comprehension of medical information; however, there is currently no Japanese WiC dataset specialized for this domain. Moreover, existing WiC datasets have been constructed using lexical resources with sense inventories, such as WordNet and UMLS, but such resources are not sufficiently developed for Japanese. Therefore, we construct a Japanese WiC dataset in the medical domain by manually annotating sense-identity labels for target words in context pairs automatically extracted from a large-scale corpus, without relying on lexical resources.

Details

Paper ID
lrec2026-ws-clinicalnlp-24
Pages
pp. 222-227
BibKey
horiguchi-etal-2026-jmedwic
Editors
Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 8th Workshop on Clinical Natural Language Processing (Clinical NLP) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • KH

    Koki Horiguchi

  • SS

    Seiji Sugiyama

  • TK

    Tomoyuki Kajiwara

  • SW

    Shoko Wakamiya

  • EA

    Eiji Aramaki

Links