Back to Main Conference 2026
LREC 2026main

J-ClinicalBench: A Benchmark for Evaluating Large Language Models on Practical Clinical Tasks in Japanese

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2uwf25atuoom

Abstract

Recent advances in large language models (LLMs) have accelerated the NLP applications in the medical and clinical domains. However, evaluations remain limited for non-English languages, such as Japanese, where clinical corpora are particularly scarce. To address this gap, we present J-ClinicalBench, a publicly available benchmark designed to reflect realistic Japanese clinical tasks. We first created 227 expert-authored clinical documents and newly constructed five datasets for core clinical tasks. Building on these datasets, J-ClinicalBench comprises nine clinical tasks spanning clinical language reasoning, generation, and understanding. We establish baseline performance on J-ClinicalBench by evaluating state-of-the-art proprietary and Japanese open-source LLMs, providing the first assessment of their utility in practical clinical scenarios. By releasing this benchmark, we aim to foster the development and evaluation of clinically applicable LLMs in Japanese healthcare, bridging the current gap between clinical NLP research and clinical practice.

Details

Paper ID
lrec2026-main-028
Pages
pp. 419-430
BibKey
shimizu-etal-2026-clinicalbench
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SS

    Seiji Shimizu

  • TN

    Tomohiro Nishiyama

  • HS

    Hisada Shohei

  • YH

    Yamato Himi

  • SW

    Shoko Wakamiya

  • YY

    Yuki Yanagisawa

  • MT

    Masami Tsuchiya

  • SH

    Satoko Hori

  • EA

    Eiji Aramaki

Links