HomeLREC 2026WorkshopsNONLITERALlrec2026-ws-nonliteral-07
Back to NONLITERAL 2026
LREC 2026workshop

Creation and Validation of a Monolingual Spanish NLI Dataset for Metaphor Interpretation via Model-in-the-Loop

Proceedings of Learning Non-Literal Expressions with Small Data @ LREC 2026

DOI:10.63317/3j5ogzih7uuv

Abstract

Large Language Models (LLMs) can easily generate fluent text, but assessing whether they truly understand metaphors requires moving beyond English-centric datasets and binary token classification tasks. To test if current state-of-the-art models perform genuine structural alignment and analogical reasoning rather than just echoing statistical token co-occurrence, we introduce a new monolingual Spanish Natural Language Inference (NLI) dataset specifically built for metaphor interpretation. Using a Model-in-the-Loop approach, we reconstruct the literal truth conditions of metaphors sourced from science texts. Before human experts curated the data, we performed an ablation study—evaluated via BERTScore and Cross-Entropy—to test whether explicit symbolic scaffolding improves analogical reasoning. While automated evaluations suggested that forcing models to follow explicit metaphorical rules diminished their fluency and increased text surprisal, human evaluation revealed the opposite: this explicit guidance produced far more accurate and strictly literal outputs. This reveals a limitation in how we evaluate NLU: automated metrics consistently penalize the cognitive ‘heavy lifting’ required to resolve a metaphor, simply because they are built to reward surface-level statistical fluency. By releasing this resource, we aim to shift the focus from surface-level generation to real cognitive alignment and metaphorical understanding in Spanish NLU.

Details

Paper ID
lrec2026-ws-nonliteral-07
Pages
pp. 77-87
BibKey
sanchezmontero-etal-2026-creation
Editors
Markus Egg, Valia Kordoni
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Learning Non-Literal Expressions with Small Data @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AS

    Alec Sanchez-Montero

  • GB

    Gemma Bel-Enguix

  • SO

    SERGIO LUIS OJEDA TRUEBA

Links