HomeLREC 2022WorkshopsCLTWlrec2022-ws-cltw-16
Back to CLTW 2022
LREC 2022workshop

Developing Automatic Speech Recognition for Scottish Gaelic

Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

DOI:10.63317/5cqnv3fyjat8

Abstract

This paper discusses our efforts to develop a full automatic speech recognition (ASR) system for Scottish Gaelic, starting from a point of limited resource. Building ASR technology is important for documenting and revitalising endangered languages; it enables existing resources to be enhanced with automatic subtitles and transcriptions, improves accessibility for users, and, in turn, encourages continued use of the language. In this paper, we explain the many difficulties faced when collecting minority language data for speech recognition. A novel cross-lingual approach to the alignment of training data is used to overcome one such difficulty, and in this way we demonstrate how majority language resources can bootstrap the development of lower-resourced language technology. We use the Kaldi speech recognition toolkit to develop several Gaelic ASR systems, and report a final WER of 26.30%. This is a 9.50% improvement on our original model.

Details

Paper ID
lrec2022-ws-cltw-16
Pages
pp. 110-120
BibKey
evans-etal-2022-developing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • LE

    Lucy Evans

  • WL

    William Lamb

  • MS

    Mark Sinclair

  • BA

    Beatrice Alex

Links