Back to Main Conference 2022
LREC 2022main

BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/43hu2w6t9wyt

Abstract

Livestreaming videos have become an effective broadcasting method for both video sharing and educational purposes. However, livestreaming videos contain a considerable amount of off-topic content (i.e., up to 50%) which introduces significant noises and data load to downstream applications. This paper presents BehanceCC, a new human-annotated benchmark dataset for off-topic detection (also called chitchat detection) in livestreaming video transcripts. In addition to describing the challenges of the dataset, our extensive experiments of various baselines reveal the complexity of chitchat detection for livestreaming videos and suggest potential future research directions for this task. The dataset will be made publicly available to foster research in this area.

Details

Paper ID
lrec2022-main-791
Pages
pp. 7284-7290
BibKey
lai-etal-2022-behancecc
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • VL

    Viet Lai

  • AP

    Amir Pouran Ben Veyseh

  • FD

    Franck Dernoncourt

  • TN

    Thien Nguyen

Links