Context-8: A Data Set for Evaluating Context Sensitivity in Machine Translation
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Context plays a crucial role in translation, enhancing both accuracy and fluency. With the advancement of machine translation (MT), the concept of context is now considered across an increasingly broader range of phenomena. Despite its importance, however, systematic definitions of context provided by communication studies and translation studies remain fragmented, and the concept of context remains elusive in MT research. To the best of our knowledge, no dataset currently exists that comprehensively evaluates MT’s sensitivity to context. In this study, we propose a systematic taxonomy of context and introduce Context-8, an evaluation dataset designed to assess context sensitivity in MT for English-to-Japanese translation. The initial release includes 130 groups comprising 533 English-to-Japanese translation examples, each requiring different context categories to produce accurate and fluent translations. The data are taken from both hand-crafted and online materials. We release Context-8 to support the evaluation and benchmarking of MT systems with respect to context sensitivity.