Back to Main Conference 2026
LREC 2026main

RelEx-PT: A Portuguese Sentence-Level Relation Extraction Dataset

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/473a5ncxrap4

Abstract

We introduce RelEx-PT, a new sentence-level Relation Extraction dataset for Portuguese. Addressing the scarcity of high-quality, controlled resources for the language, RelEx-PT provides a balanced benchmark comprising 18 Wikidata-derived relation types across diverse domains. The dataset is built through a distant supervision pipeline that links Wikidata triples with Portuguese Wikipedia sentences and enhanced by a Natural Language Inference (NLI)-based filtering process, combining scalability with quality assurance. Additionally, we conduct baseline experiments to evaluate the dataset’s applicability across diverse extraction settings, including Relation Classification (RC), Relation Triple Extraction, and Open Information Extraction. These experiments leverage both prompting and fine-tuning strategies using Large Language Models. The results show that RelEx-PT effectively supports a range of extraction paradigms, yielding high performance in RC and competitive results in structured triple generation, while also highlighting key challenges in open-ended extraction.

Details

Paper ID
lrec2026-main-609
Pages
pp. 7681-7691
BibKey
pinto-etal-2026-relex
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TP

    Tomás Pinto

  • CS

    Catarina Silva

  • HO

    Hugo Goncalo Oliveira

Links