HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-08
Back to DIALRES 2026
LREC 2026workshop

Can LLM Agents Identify Spoken Dialects like a Linguist?

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/27m2tbgjcat8

Abstract

Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a human linguist one. Our approach uses phonetic transcriptions produced by ASR systems and combines them with linguistic resources such as dialect feature maps, vowel history, and rules. Our findings indicate that, when linguistic information is provided, the LLM predictions improve. The human baseline shows that automatically generated transcriptions can be beneficial for such classifications, but also present opportunities for improvement.

Details

Paper ID
lrec2026-ws-dialres-08
Pages
pp. 83-92
BibKey
bystrich-etal-2026-can
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • TB

    Tobias Bystrich

  • LH

    Lukas Hamm

  • MA

    Maria Hassan Akhter

  • LF

    Lea Fischbach

  • LF

    Lucie Flek

  • AK

    Akbar Karimi

Links