Arabic Speech Recognition of zero-resourced Languages: A case of Shehri (Jibbali) Language

Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

DOI:10.63317/5ozhxig2mmtz

Abstract

Many under-resourced languages lack computational resources for automatic speech recognition (ASR) due to data scarcity issues. This makes developing accurate ASR models challenging. Shehri or Jibbali, spoken in Oman, lacks extensive annotated speech data. This paper aims to improve an ASR model for this under-resourced language. We collected a Shehri (Jibbali) speech corpus and utilized transfer learning by fine-tuning pre-trained ASR models on this dataset. Specifically, models like Wav2Vec2.0, HuBERT and Whisper were fine-tuned using techniques like parameter-efficient fine-tuning. Evaluation using word error rate (WER) and character error rate (CER) showed that the Whisper model, fine-tuned on the Shehri (Jibbali) dataset, significantly outperformed other models, with the best results from Whisper-medium achieving 3.5% WER. This demonstrates the effectiveness of transfer learning for resource-constrained tasks, showing high zero-shot performance of pre-trained models.

Resources

Details

Paper ID

lrec2024-ws-osact-10

Pages

pp. 84-92

DOI

10.63317/5ozhxig2mmtz

BibKey

alrashoudi-etal-2024-arabic

Editors

Hend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed

Publisher

European Language Resources Association (ELRA) and ICCL

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

Location

Turin, Italy

Date

20 - 25 May 2024

Authors

NA
Norah A. Alrashoudi
OA
Omar Said Alshahri
HA
Hend Al-Khalifa

Links

URL

DOI