Register Sensitivity in Scalar MT Evaluation: Evidence from Spanish–Basque Informal Discourse

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/4ig5743iz2r3

Abstract

Automatic scalar metrics are widely used for machine translation (MT) evaluation, yet their behavior under sociolinguistic variation remains underexplored, particularly in under-resourced and minority-language contexts. We present a small, controlled empirical analysis of reference-based evaluation in Spanish–Basque informal discourse. Register is operationalized as indexical density, capturing dialectal forms, informal lexicon, code-switching and orthographic stylization. Across two MT systems and prompting conditions, sentence-level scores from chrF++, COMET-DA, and XCOMET-XL show a consistent negative association with indexical density under the original informal reference. In a reference-perturbation design that holds MT outputs constant while replacing the informal reference with a standardized Batua version, scores increase systematically, particularly for high-density items, and the density–score association weakens. These results provide controlled evidence that evaluation outcomes in this setting depend in part on reference register configuration. In minority-language and informal domains, reference design choices may influence how translation quality is measured and interpreted.

Resources

Details

Paper ID

lrec2026-ws-sigul-02

Pages

pp. 19-32

DOI

10.63317/4ig5743iz2r3

BibKey

aranberri-2026-register

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

NA
Nora Aranberri

Links

URL

DOI