Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstract
This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.
Details
Authors
- MV
Marcelo Viridiano
- AL
Arthur Lorenzi
- TT
Tiago Timponi Torrent
- EM
Ely E. Matos
- AP
Adriana S. Pagano
- NS
Natália Sathler Sigiliano
- MG
Maucha Gamonal
- Hd
Helen de Andrade Abreu
- LV
Lívia Vicente Dutra
- MS
Mairon Samagaio
- MC
Mariane Carvalho
- FC
Franciany Campos
- GA
Gabrielly Azalim
- BM
Bruna Mazzei
- MF
Mateus Fonseca de Oliveira
- AL
Ana Carolina Luz
- LP
Livia Padua Ruiz
- JB
Júlia Bellei
- AP
Amanda Pestana
- JC
Josiane Costa
- IR
Iasmin Rabelo
- AS
Anna Beatriz Silva
- RR
Raquel Roza
- MS
Mariana Souza Mota
- IO
Igor Oliveira
- MP
Márcio Henrique Pelegrino de Freitas