Back to Main Conference 2026
LREC 2026main

DEJIMA: A Novel Large-scale Japanese Dataset for Image Captioning and Visual Question Answering

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/45nioi7qjz28

Abstract

Vision-and-Language (V&L) models depend on large-scale, high-quality datasets, yet most resources are English-centric, and existing Japanese V&L datasets face a fundamental trade-off: manually annotated corpora offer quality but limited scale, translated datasets introduce unnatural phrasing and cultural bias, and web-crawled collections achieve scale but suffer from noise and poor grounding. To resolve this trade-off, we propose DEJIMA, a novel pipeline whose key idea is detection-guided LLM refinement: object detection first extracts visually verifiable evidence (labels and bounding boxes), then an LLM generates or refines Japanese text conditioned on this evidence, ensuring both factual grounding and linguistic naturalness without costly human annotation. Using this pipeline, we build two resources: an image–caption dataset (DEJIMA-Cap) and a VQA dataset (DEJIMA-VQA), each containing approximately 3.88M image–text pairs—over 20 times larger than existing Japanese V&L datasets. Human evaluations demonstrate that DEJIMA achieves substantially higher Japaneseness and linguistic naturalness than translation- or annotation-based baselines, while maintaining factual correctness comparable to human-annotated corpora. Models trained on DEJIMA show consistent improvements across multiple Japanese multimodal benchmarks, confirming that culturally grounded, large-scale resources play a key role in enhancing model performance. All pipeline components are commercially licensed, and we publicly release the dataset and metadata to support further research and applications. Our project page is available at https://mil-tokyo.github.io/DEJIMA-dataset/.

Details

Paper ID
lrec2026-main-744
Pages
pp. 9478-9489
BibKey
katsube-etal-2026-dejima
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TK

    Toshiki Katsube

  • FT

    Fukuhara Taiga

  • KA

    Kenichiro Ando

  • YM

    Yusuke Mukuta

  • KU

    Kohei Uehara

  • TH

    Tatsuya Harada

Links