Text-only Domain Adaptation for Low-Resource ASR Using Large Language Models
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Abstract
Automatic Speech Recognition (ASR) increasingly mediates access to broadcast media, public discourse and cultural archives. For minoritised languages, however, the development of robust ASR systems is constrained by limited and domain-restricted text data. This paper investigates cross-lingual text expansion (XLTE), a method that uses a Large Language Model (LLM) to generate in-domain text in a low-resource language from high-resource language summaries. We further examine whether supervised fine-tuning on a small set of human-authored texts enhances generation quality. Using Scottish Gaelic as a case study, we show that synthetic text generated via fine-tuned XLTE can be used to train an external language model that reduces Word Error Rate (WER) by 24.48% in a previously unseen broadcast domain. Our findings demonstrate that text-only domain adaptation through cross-lingual generation can strengthen speech technology in sparse data settings. Beyond engineering gains, the approach offers a scalable pathway for improving the digital representation, accessibility and sustainability of minoritised-language media and cultural heritage.