HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-11
Back to DIALRES 2026
LREC 2026workshop

Exploring the reusability of Northern Kurdish resources for Badini speech recognition

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/2gzjngtqiqp7

Abstract

Badini is a variant of the Kurdish language spoken in the Duhok province of the Kurdistan Region of Iraq. It is written mainly in a modified version of the Arabic script. Although it shares the same script as Central Kurdish (CKB), it is linguistically classified under the Northern Kurdish (KMR) branch. In this paper, we explore the potential and limitations of Northern Kurdish ASR resources for the Badini variant. Firstly, we transliterate the Common Voice 18 dataset from the Latin script into the modified Arabic script and revised it to align with the orthographic conventions of Badini variant. Additionally, we introduce the first text collection for the Badini variant, containing 14,22 million tokens, which serves as a source for speech synthesis. A third resource developed in this research is a standard speech recognition benchmark recorded by 5 speakers which includes 2 hours and 46 minutes of multi-domain read speech. Results show that combining transliterated and synthetic data significantly improves recognition accuracy, achieving a 6.8% CER and 34% WER. All three resources curated during this research will be made available under the CC BY-NC-ND 4.0 license.

Details

Paper ID
lrec2026-ws-dialres-11
Pages
pp. 110-115
BibKey
mohammadamini-etal-2026-exploring
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MM

    Mohammad Mohammadamini

  • AM

    Aveen Jalal Mohammed

  • BM

    Barzan Hussein Mohammed

  • DA

    Dezheen H. Abdulazeez

  • IS

    Imad Saeed Sadeeq

  • DS

    Dilgash Mohammed Salih

  • AM

    Amera Ismail Melhum

  • AD

    Abuobaida Abdullah Dheyab

Links