Back to Main Conference 2024
LREC-COLING 2024main

Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New Dataset and Baseline

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4a8babqbfgx5

Abstract

Much of commonsense knowledge in real world is the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the entity and the corresponding bounding box groundings in image (i.e., visual entities). A new dataset (Wiki-GMPER) is bult and extensive experiments are conducted to evaluate the effectiveness of our proposed model.

Details

Paper ID
lrec2024-main-0702
Pages
pp. 7971-7981
BibKey
ren-etal-2024-grounded
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • HR

    Haopeng Ren

  • YZ

    Yushi Zeng

  • YC

    Yi Cai

  • ZY

    Zhenqi Ye

  • LY

    Li Yuan

  • PZ

    Pinli Zhu

Links