Back to Main Conference 2024
LREC-COLING 2024main

Abstractive Multi-Video Captioning: Benchmark Dataset Construction and Extensive Evaluation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/2686x3arx9ts

Abstract

This paper introduces a new task, abstractive multi-video captioning, which focuses on abstracting multiple videos with natural language. Unlike conventional video captioning tasks generating a specific caption for a video, our task generates an abstract caption of the shared content in a video group containing multiple videos. To address our task, models must learn to understand each video in detail and have strong abstraction abilities to find commonalities among videos. We construct a benchmark dataset for abstractive multi-video captioning named AbstrActs. AbstrActs contains 13.5k video groups and corresponding abstract captions. As abstractive multi-video captioning models, we explore two approaches: end-to-end and cascade. For evaluation, we proposed a new metric, CocoA, which can evaluate the model performance based on the abstractness of the generated captions. In experiments, we report the impact of the way of combining multiple video features, the overall model architecture, and the number of input videos.

Details

Paper ID
lrec2024-main-0005
Pages
pp. 57-69
BibKey
takahashi-etal-2024-abstractive
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • RT

    Rikito Takahashi

  • HK

    Hirokazu Kiyomaru

  • CC

    Chenhui Chu

  • SK

    Sadao Kurohashi

Links