Back to Main Conference 2024
LREC-COLING 2024main

The Effects of Pretraining in Video-Guided Machine Translation

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/3ut4fw9btu56

Abstract

We propose an approach that improves the performance of VMT (Video-guided Machine Translation) models, which integrate text and video modalities. We experiment with the MAD (Movie Audio Descriptions) dataset, a new dataset which contains transcribed audio descriptions of movies. We find that the MAD dataset is more lexically rich than the VATEX dataset (the current VMT baseline), and we experiment with MAD pretraining to improve performance on the VATEX dataset. We experiment with two different video encoder architectures: a Conformer (Convolution-augmented Transformer) and a Transformer. Additionally, we conduct experiments by masking the source sentences to assess the degree to which the performance of both architectures improves due to pretraining on additional video data. Finally, we conduct an analysis of the transfer learning potential of a video dataset and compare it to pretraining on a text-only dataset. Our findings demonstrate that pretraining with a lexically rich dataset leads to significant improvements in model performance when models use both text and video modalities.

Details

Paper ID
lrec2024-main-1380
Pages
pp. 15888-15898
BibKey
shurtz-etal-2024-effects
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • AS

    Ammon Shurtz

  • LS

    Lawry Sorenson

  • SR

    Stephen D. Richardson

Links