HomeLREC 2026WorkshopsPARLACLARINlrec2026-ws-parlaclarin-05
Back to PARLACLARIN 2026
LREC 2026workshop

Towards ParlaMint-DE: Improving the Interoperability of the GermaParl Corpus of Plenary Protocols of the German Bundestag

Proceedings of the ParlaCLARIN V Workshop on Interoperability, Multilinguality, and Multimodality in Parliamentary Corpora

DOI:10.63317/45jehvzrtdjp

Abstract

With the number of machine-readable corpora of plenary protocols continuously increasing, concerns about the potentials of harmonisation and shared encoding standards gain prominence. Interoperability of corpora can contribute to innovative research, in particular when comparative analyses are concerned. The ParlaMint encoding schema introduced by CLARIN provides comprehensive guidelines towards this goal. This contribution shows how GermaParl, a large corpus of plenary protocols of the German Bundestag, is transformed from a TEI-inspired XML format to the ParlaMint encoding schema. Based on previous work, this paper presents an adjusted preparation pipeline and discusses challenges of advancing an established resource into a new data format. The prospective ParlaMint-DE corpus will make the plenary debates in Germany from 1949 to 2025 available in a highly interoperable data format. Clear documentation and taxonomies increase the usefulness of the resource in comparative analyses, whereas additional metadata and linguistic annotation broaden its general applicability.

Details

Paper ID
lrec2026-ws-parlaclarin-05
Pages
pp. 31-43
BibKey
leonhardt-etal-2026-parlamint
Editors
Maria Eskevich, Vincent Vandeghinste, David Bodron
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the ParlaCLARIN V Workshop on Interoperability, Multilinguality, and Multimodality in Parliamentary Corpora
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • CL

    Christoph Leonhardt

  • AB

    Andreas Blätte

Links