Phonologically-aware Automatic Speech Recognition Evaluation of Low-Resource Languages: The Case of Basque Dialects

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

Abstract

Automatic speech recognition models are typically trained with data of standard languages. However, their performance degrades when dealing with non-standard dialectal speech. In this paper, we present the first evaluation of an automatic speech recognition system for Basque, a low-resource language, based on spontaneous broadcast speech with high representation of dialectal speech. It relies on a 140-h manually annotated propietary corpus of television programs broadcast by Basque Radio Television, including dialect-level labels, as well as standardized and pseudo-phonetic transcriptions. We find that recognition performance significantly degrades for dialectal compared to standard speech, for all dialects present in our corpus. Subsequently, we provide a quantitative analysis of phonological phenomena based on single-word substitution errors, and identify 52 recurrent phenomena, grouped into sound deletions, epentheses, and substitutions. We further show a modest but statistically significant correlation between the number of phonological phenomena in an utterance and its recognition error rate. Our findings highlight the limitations of dialect-agnostic evaluation and motivate linguistically informed, dialect-aware strategies for automatic speech recognition in low-resource and typologically diverse languages.