A Comparative Study of Multilingual Fine-tuning and Prompting for Automatic Text Readability Classification in Galician

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Abstract

Despite advancements in automatic readability assessment, low-resource languages such as Galician remain under-explored. This study addresses this gap by presenting a comparative study of readability assessment techniques in Galician, including fine-tuning of encoder models as well as prompting strategies using large generative models. Due to the scarcity of native Galician resources, neural machine translation was employed to generate synthetic Galician data. The analysis begins with BERT-based monolingual models trained on the synthetic data. For multilingual models, the impact of using original versus translated data was compared in order to assess the effects of translation-based augmentation. Finally, several LLMs were evaluated using zero-shot and few-shot prompting methods. The results indicate that generative models are not yet competitive with encoder models tuned for text classification in Galician, and that data generated through machine translation improves the performance of monolingual models but has little effect on multilingual models.

Resources

Details

Paper ID

lrec2026-ws-readixtsar-08

Pages

pp. 101-120

DOI

10.63317/4tnwhe3r9579

BibKey

rodrguezrey-etal-2026-comparative

Editors

Matthew Shardlow, Thomas François, Raquel Amaro, Jorge Baptista, Rémi Cardon, Eugénio Ribeiro, Horacio Saggion, Regina Stodden, Amalia Todirascu, Rodrigo Wilkens

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

SR
Sandra Rodríguez Rey
MG
Marcos Garcia

Links

URL

DOI