Back to Main Conference 2024
LREC-COLING 2024main

Multimodal Cross-lingual Phrase Retrieval

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4yugrxf549hv

Abstract

Cross-lingual phrase retrieval aims to retrieve parallel phrases among languages. Current approaches only deals with textual modality. There lacks multimodal data resources and explorations for multimodal cross-lingual phrase retrieval (MXPR). In this paper, we create the first MXPR data resource and propose a novel approach for MXPR to explore the effectiveness of multi-modality. The MXPR data resource is built by marrying the benchmark dataset for textual cross-lingual phrase retrieval with Wikimedia Commons, which is a media store containing tremendous texts and related images. In the built resource, the phrase pairs of the textual benchmark dataset are equipped with their related images. Based on this novel data resource, we introduce a strategy to bridge the gap between different modalities by multimodal relation generation with a large multimodal pre-trained model and consistency training. Experiments on benchmarked dataset covering eight language pairs show that our MXPR approach, which deals with multimodal phrases, performs significantly better than pure textual cross-lingual phrase retrieval.

Details

Paper ID
lrec2024-main-1040
Pages
pp. 11917-11927
BibKey
dong-etal-2024-multimodal
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • CD

    Chuanqi Dong

  • WZ

    Wenjie Zhou

  • XD

    Xiangyu Duan

  • YZ

    Yuqi Zhang

  • MZ

    Min Zhang

Links