Back to Main Conference 2026
LREC 2026main

From Generation to Evaluation: A Resource for Error-Categorized Question Generation from Video Transcripts

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5c7pqn99wt3a

Abstract

A key challenge in automated question generation is producing grammatically correct, error-free, and contextually relevant questions. While large language models already handle this well, smaller models that can run on consumer-grade hardware face greater difficulties. Another obstacle is the lack of large, high-quality datasets, particularly for education video transcripts, which limits the diversity and applicability of training data. On top of this, current evaluation methods either rely on strict comparison to a "ground truth," undervaluing valid but unmatched questions, or on expert judgments, which do not scale. They do not provide insights into the nature of errors. In this paper, we introduce a dataset of real-life educational video transcripts and investigate the question-generating capabilities of small language models by assessing their output with pre-defined error categories. We also present a novel approach to automatic quality assessment by classifying questions into predefined error categories. We show that questions generated by small language models are still prone to error. Our proposed classification approach outperforms baseline approaches and matches GPT-5 performance by reaching an accuracy of 72%.

Details

Paper ID
lrec2026-main-170
Pages
pp. 2166-2177
BibKey
berger-etal-2026-generation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JB

    Joshua Berger

  • MS

    Markos Stamatakis

  • AH

    Anett Hoppe

  • RE

    Ralph Ewerth

  • CW

    Christian Wartena

Links