Back to Main Conference 2026
LREC 2026main

Context-8: A Data Set for Evaluating Context Sensitivity in Machine Translation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3cawdf257c7e

Abstract

Context plays a crucial role in translation, enhancing both accuracy and fluency. With the advancement of machine translation (MT), the concept of context is now considered across an increasingly broader range of phenomena. Despite its importance, however, systematic definitions of context provided by communication studies and translation studies remain fragmented, and the concept of context remains elusive in MT research. To the best of our knowledge, no dataset currently exists that comprehensively evaluates MT’s sensitivity to context. In this study, we propose a systematic taxonomy of context and introduce Context-8, an evaluation dataset designed to assess context sensitivity in MT for English-to-Japanese translation. The initial release includes 130 groups comprising 533 English-to-Japanese translation examples, each requiring different context categories to produce accurate and fluent translations. The data are taken from both hand-crafted and online materials. We release Context-8 to support the evaluation and benchmarking of MT systems with respect to context sensitivity.

Details

Paper ID
lrec2026-main-385
Pages
pp. 4902-4920
BibKey
wang-etal-2026-context
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • DW

    Dongyue Wang

  • KK

    Kyo Kageura

Links