Benchmarking Multilingual LLM Translation Accuracy for Fuzhounese

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/4mm9bs8yy4ie

Abstract

Multilingual large language models are known to perform very well on high-resource languages, while their ability to process severely under-resourced languages remains underexplored. We investigate multilingual LLM translation performance on Fuzhounese, an under-resourced Sinitic language without a standardized orthography and almost no digital presence. Having adopted some methodological insights from the HKCanto-Eval benchmark, this paper presents a bidirectional translation framework based on a dataset of 305 sentences (300 constructed English sentences and 5 additional reference translations), that assesses the comprehension and generation of Fuzhounese, evaluated using automatic metrics and human Likert-scale judgments. The results reveal poor performance on Fuzhounese in both translation directions: BERTScore and chrF++ values consistently stay low when models are faced with comprehension tasks, while for generation tasks, scores are generally more than twofold lower than those for Mandarin or Cantonese. These findings highlight structural biases in multilingual LLMs toward high-resource languages and stress the need for resource-aware modeling and evaluation approaches in multilingual NLP systems.

Resources

Details

Paper ID

lrec2026-ws-sigul-20

Pages

pp. 198-209

DOI

10.63317/4mm9bs8yy4ie

BibKey

zheng-etal-2026-benchmarking

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

SZ
Sue Zheng
JB
Jelke Bloem

Links

URL

DOI