Back to Main Conference 2026
LREC 2026main

ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech, Polish, and Serbian

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3b7dnbjr75es

Abstract

ParlaSpeech is a collection of spoken parliamentary corpora currently spanning four Slavic languages – Croatian, Czech, Polish and Serbian – with a total size of more than 6 thousand hours. The corpora were built in an automatic fashion from the ParlaMint transcripts and their corresponding metadata, which were aligned to the speech recordings of each corresponding parliament. In this release of the dataset, each of the corpora has been significantly enriched with several automatic annotation layers. The textual modality of all four corpora has been enriched with linguistic annotations and sentiment predictions. Similarly, their spoken modality has been automatically enriched with occurrences of filled pauses, the most frequent type of disfluency in typical speech. Two languages have been additionally enriched with detailed word- and grapheme-level alignments, and the automatic annotation of the position of primary stress in multisyllabic words. With these enrichments, the usefulness of the corpora has been greatly increased for downstream research across multiple disciplines, which we showcase through an analysis of acoustic correlates of sentiment. All the corpora are made available for download in JSONL and TextGrid formats, as well as for search through a concordancer.

Details

Paper ID
lrec2026-main-447
Pages
pp. 5677-5688
BibKey
ljubei-etal-2026-parlaspeech
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NL

    Nikola Ljubešić

  • PR

    Peter Rupnik

  • IP

    Ivan Porupski

  • TP

    Taja Kuzman Pungeršek

Links