Back to Main Conference 2026
LREC 2026main

Benchmarking Portuguese Open Information Extraction

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2wuxjj5bo7ax

Abstract

Open Information Extraction (OIE) has seen significant advancements for English, but progress in Portuguese has been hindered by a lack of resources such as Datasets and standardized evaluation benchmarks. This work addresses this critical gap by establishing the a systematic and reproducible benchmark for Portuguese OIE systems. We conduct a comprehensive evaluation of eight systems, spanning a decade of research and encompassing both rule-based and neural architectures. The performance of these systems is measured against three distinct Portuguese corpora (WIKI200, CETEN200, and Gamalho) using the established CaRB methodology. Our results reveal that no single system excels across all three datasets. Rule-based models perform strongly on general text (WIKI200, CETEN200) but falter on specialized corpora (Gamalho), while neural systems demonstrate more consistent but not superior performance. With overall F1 scores averaging around 40%, our findings confirm that Portuguese OIE remains a largely unsolved task. This benchmark provides a baseline for future research and highlights the need for a high-quality, manually annotated gold-standard dataset to drive meaningful progress in the field. The evaluation benchmark/framework is made publicly available at https://github.com/gabrielrsilva11/PT-OIE-Benchmark.

Details

Paper ID
lrec2026-main-610
Pages
pp. 7692-7700
BibKey
silva-etal-2026-benchmarking
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • GS

    Gabriel Silva

  • MR

    Mário Rodrigues

  • AT

    António Teixeira

  • MA

    Marlene Amorim

Links