HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-25
Back to DIALRES 2026
LREC 2026workshop

Sociolinguistic aspects of crowdsourcing for a vocal corpus of Alsatian

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/5ch7cwah438g

Abstract

Alsatian is a regional low-resource language spoken in a majority-language context. In order to create a voice dataset suited for training automatic speech recognition and speech-to-text models, we launched a crowdsourcing campaign on the platform Mozilla Common Voice. We describe sociolinguistic issues we ran into, such as participants’ perception of their own language and its role in the AI landscape, which are vital to address to raise the participation in the crowdsourcing effort. We found that the participants are often confused about NLP and AI tools, and have a strong interested in preserving their language.

Details

Paper ID
lrec2026-ws-dialres-25
Pages
pp. 256-264
BibKey
erhart-etal-2026-sociolinguistic
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PE

    Pascale Erhart

  • LH

    Lucile Hamm

  • SB

    Sam Bigeard

  • CW

    Carole Werner

  • MY

    Malek Yaich

  • SO

    Slim Ouni

Links