HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-15
Back to DIALRES 2026
LREC 2026workshop

German Dialects Across Situations, Generations, and Regions: The REDE corpus as an Oral Resource for NLP

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/4fe4dkefqah9

Abstract

Recent advances in speech and language technologies increasingly rely on large and diverse corpora that represent linguistic variation across dialect regions, communicative situations, and social speaker characteristics. While substantial resources are available for Standard German, comparable spoken corpora for German dialects have so far been largely lacking, limiting the development and evaluation of dialect-sensitive NLP systems. The REDE corpus addresses this gap by providing a methodologically uniform collection of spoken German for 148 locations that systematically covers all major dialect areas in Germany. It comprises contemporary recordings collected in multiple elicitation and interaction settings, capturing variation across speaking styles, situational contexts, and speaker generations. With more than 1,500 hours of speech and rich metadata on regional and social dimensions, the REDE corpus constitutes a large-scale oral resource suitable for both linguistic research and NLP applications. This paper presents the design, structure, and methodological foundations of the corpus and discusses its relevance for current speech technology requirements.

Details

Paper ID
lrec2026-ws-dialres-15
Pages
pp. 144-152
BibKey
fischer-etal-2026-german
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HF

    Hanna Fischer

  • AL

    Alfred Lameli

Links