Back to Main Conference 2026
LREC 2026main

JPPB: Automatic Construction of a Soft-Labeled Japanese Patient Phrase Bank for Symptom Normalization

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/345uq5t7y98h

Abstract

Patient-generated symptom expressions are linguistically diverse, often deviating from standardized medical terminology. This paper introduces the Japanese Patient Phrase Bank (JPPB), the first automatically constructed phrase-level normalization resource for Japanese patient language. JPPB introduces an embedding-based soft labeling framework that transforms traditional one-to-one dictionary mappings into graded and ambiguity-aware associations. This framework represents a shift from word-level to phrase-level normalization in Japanese. The resource covers 7,035 phrase–term pairs across 412 symptoms. Evaluation on the KEEPHA and MedNLP-SC datasets shows that soft labels consistently improve Top-1 accuracy and better approximate gold label distributions compared with hard labels. While LLM-based normalization achieved the highest scores, JPPB provides a lightweight and transparent alternative suitable for local deployment. This work demonstrates that large-scale, automatically generated phrase banks can achieve competitive performance relative to manually curated resources and serve as practical, scalable resources for medical natural language processing in Japanese.

Details

Paper ID
lrec2026-main-621
Pages
pp. 7816-7828
BibKey
nishiyama-etal-2026-jppb
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TN

    Tomohiro Nishiyama

  • MK

    Mana Kuramoto

  • SW

    Shoko Wakamiya

  • EA

    Eiji Aramaki

Links