Back to Main Conference 2026
LREC 2026main

Meta4XNLI-ptBR: Brazilian Portuguese Extension of Meta4XNLI Corpus

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/45566xcgz65x

Abstract

Metaphor is a pervasive phenomenon in language that shapes how people conceptualize and communicate complex ideas. Detecting and interpreting metaphor is not only relevant for linguistic theory but also for many Natural Language Processing (NLP) applications, from machine translation to sentiment analysis, to mention a few. Despite its relevance, no open-source annotated corpus of metaphors exists for one of the world’s most widely spoken languages: Brazilian Portuguese. This paper addresses this gap by presenting an extension of Meta4XNLI, Meta4XNLI-ptBR, with token-level metaphor annotation in Brazilian Portuguese. To achieve this, we propose a pipeline that combines automatic translation via language models with human annotation, following guidelines adapted from MIPVU and Meta4XNLI. The final corpus contains 1,784 human-annotated sentences, of which 42.26% contain at least one metaphorical token. To our knowledge, this is the first open corpus of its kind for Brazilian Portuguese, and it is already freely available.

Details

Paper ID
lrec2026-main-131
Pages
pp. 1668-1676
BibKey
johansson-etal-2026-meta4xnli
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • KJ

    Karina Johansson

  • FA

    Fernanda Assi

  • IS

    Isabella da Silva

  • RP

    Rafael Passador

  • IR

    Isabela Rodrigues

  • AP

    Aline Paes

  • HC

    Helena Caseli

Links