Back to Main Conference 2022
LREC 2022main

Bazinga! A Dataset for Multi-Party Dialogues Structuring

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/5hw96y77itih

Abstract

We introduce a dataset built around a large collection of TV (and movie) series. Those are filled with challenging multi-party dialogues. Moreover, TV series come with a very active fan base that allows the collection of metadata and accelerates annotation. With 16 TV and movie series, Bazinga! amounts to 400+ hours of speech and 8M+ tokens, including 500K+ tokens annotated with the speaker, addressee, and entity linking information. Along with the dataset, we also provide a baseline for speaker diarization, punctuation restoration, and person entity recognition. The results demonstrate the difficulty of the tasks and of transfer learning from models trained on mono-speaker audio or written text, which is more widely available. This work is a step towards better multi-party dialogue structuring and understanding. Bazinga! is available at hf.co/bazinga. Because (a large) part of Bazinga! is only partially annotated, we also expect this dataset to foster research towards self- or weakly-supervised learning methods.

Details

Paper ID
lrec2022-main-367
Pages
pp. 3434-3441
BibKey
lerner-etal-2022-bazinga
Editors
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 - 25 June 2022

Authors

  • PL

    Paul Lerner

  • JB

    Juliette Bergoënd

  • CG

    Camille Guinaudeau

  • HB

    Hervé Bredin

  • BM

    Benjamin Maurice

  • SL

    Sharleyne Lefevre

  • MB

    Martin Bouteiller

  • AB

    Aman Berhe

  • LG

    Léo Galmant

  • RY

    Ruiqing Yin

  • CB

    Claude Barras

Links