Interlinear Glosses as a Multilingual Pivot for Machine Translation: An Updated Study on Turkish with Restricted Resources
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Abstract
Translating very low-resource languages is a challenge that has been approached using available linguistic cues. Among them, interlinear glosses are linguistic annotations that can essentially bridge the gap between two languages thanks to both grammatical and lexical information. We perform a case study on a simulated low-resource condition for Turkish, a morphologically rich language, with a pipeline approach, following (Zhou et al., 2020). A source sentence is passed through a morphological analyzer and a bilingual dictionary to obtain a gloss-like representation. We then evaluate the current capacity of Neural Machine Translation systems and Large Language Models in performing the translation task from interlinear glosses into fluent English translations. We notably evaluate how performance scales with multilingual glossed data and how translation is affected by pseudo-glosses. Pivoting with glosses remains a better approach than a direct translation for languages with limited parallel data for training. Although glosses remain helpful resources, translations are sensitive to their quality, especially for lexical information.