Back to Main Conference 2026
LREC 2026main

Voices across Decades: A Multimodal Diachronic Corpus of German Bundestag Debates (GerParlDia-MM)

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3vgihkgnkg75

Abstract

This paper presents a multimodal diachronic corpus of German parliamentary debates spanning 1949 – 2025. The dataset focuses on speakers with exceptionally long political careers in the Bundestag, covering at least six parliamentary terms for female and eight for male members, comprising 75 individuals (43 men/32 female) and 2,136 speeches. The corpus integrates audio, video (when available), and official transcripts, enriched with metadata on date, party affiliation, and legislative term. Transcripts were temporally aligned with parliamentary media recordings, and non-speech segments were automatically removed. The corpus enables research on voice aging, intra-speaker variability, and longitudinal political language, and supports benchmarking of ASR and speaker recognition across decades. Thus, this corpus bridges the gap between short-term speech corpora and single-speaker longitudinal datasets, offering a unique foundation for studying change in voice, style, and rhetoric over more than seventy years of German parliamentary history.

Details

Paper ID
lrec2026-main-498
Pages
pp. 6289-6297
BibKey
siegert-2026-voices
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • IS

    Ingo Siegert

Links