Back to Main Conference 2026
LREC 2026main

LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2oq3nh8a3zff

Abstract

Long-form question answering (LFQA) demands nuanced evaluation of multi-sentence explanatory responses, yet existing metrics often fail to reflect human judgment. We present LFQA-HP-1M, a large-scale dataset comprising 1.3M human pairwise preference annotations for LFQA. We propose nine rubrics for answer quality evaluation, and show that simple linear models based on these features perform comparably to state-of-the-art LLM evaluators. We further examine transitivity consistency, positional bias, and verbosity biases in LLM evaluators and demonstrate their vulnerability to adversarial perturbations. Overall, this work provides one of the largest public LFQA preference datasets and a rubric-driven framework for transparent and reliable evaluation.

Details

Paper ID
lrec2026-main-425
Pages
pp. 5450-5465
BibKey
jahan-etal-2026-lfqa
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • RJ

    Rafid Ishrak Jahan

  • FI

    Fahmid Shahriar Iqbal

  • SC

    Sagnik Ray Choudhury

Links