Back to Main Conference 2026
LREC 2026main

EduBench: A Portuguese Benchmark for Open-Ended Discursive Question Answering

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4nocig8f36r9

Abstract

Evaluating open-ended text generation in large language models remains challenging, particularly for non-English languages. We introduce EduBench, a comprehensive Portuguese-language benchmark comprising 3,149 discursive questions from Brazilian university entrance examinations spanning 2015–2025. Unlike multiple-choice or extractive QA benchmarks, EduBench requires extended, argumentative responses across diverse domains, including Humanities, Exact and Natural Sciences, and Languages. Each question includes expert-curated reference answers from official sources, rich metadata, and automated image descriptions to support text-only evaluation. We establish baseline results using nine contemporary models, ranging from 4B-parameter SLMs to state-of-the-art reasoning-capable LLMs, and evaluate them using complementary metrics (BLEU, BERTScore, G-Eval). Our results reveal substantial metric disagreement and highlight the complexity of assessing discursive generation, with models achieving 54–71% alignment with expert answers. We release EduBench publicly to support research on Portuguese NLP and open-ended generation evaluation.

Details

Paper ID
lrec2026-main-360
Pages
pp. 4587-4596
BibKey
paiola-etal-2026-edubench
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PP

    Pedro Henrique Paiola

  • LM

    Luís Gabriel Damiati Mendes

  • BM

    Bruno de Oliveira Monchelato

  • AS

    André da Fonseca Schuck

  • GG

    Gabriel Lino Garcia

  • DR

    Douglas Rodrigues

  • HC

    Helena de Medeiros Caseli

  • JP

    João Paulo Papa

Links