HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-30
Back to DIALRES 2026
LREC 2026workshop

First Steps in ASR for Cypriot Greek: Challenges and Insights

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/2iznmge7hdtx

Abstract

This paper presents the first automatic speech recognition (ASR) system for Cypriot Greek, a non-standardized variety of Modern Greek with distinctive phonological, lexical, and orthographic characteristics. We adapt Whisper, a state-of-the-art multilingual ASR model, to Cypriot Greek through fine-tuning on the mozilla common voice spontaneous speech dataset for Cypriot Greek. The phonological and lexical divergence of Cypriot Greek from Standard Modern Greek poses significant challenges for mainstream ASR, particularly under conditions of limited training data and dialectal variation. Results demonstrate that whisper-medium achieved a best word error rate (WER) of 37.85%, while whisper-large-v3 consistently outperformed it, reaching a minimum WER of 33.93%. In the light of these findings, increased model size, combined with targeted fine-tuning on normalized dialectical data, significantly improves recognition accuracy, indicating that careful handling of orthographic and dialectical variation provides an effective path for ASR adaptation to low-resource varieties.

Details

Paper ID
lrec2026-ws-dialres-30
Pages
pp. 308-314
BibKey
stamou-etal-2026-first
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • VS

    Vivian Stamou

  • SA

    Spyros Armostis

  • AK

    Antigoni Klimi

  • GP

    Georgios Paraskevopoulos

  • VK

    Vassilis Katsouros

  • AA

    Antonios Anastasopoulos

Links