Back to Main Conference 2026
LREC 2026main

Enhancing and Evaluating Tabular Models on the Fly via Synthetic Question–Answer Generation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/27e3cist39z2

Abstract

Question Answering (QA) over Tabular Data has been traditionally a challenging task, but LLMs have recently shown the ability to respond to questions related to this type of structured data. However, current tabular QA datasets are skewed toward Wikipedia tables and SQL-style answers composed of human-crafted question–answer pairs. This limits the evaluation of LLMs on this task to a narrow genre of data and language, while also requiring extensive human effort for dataset or benchmark creation. To address this, we introduce SynTabQA, a methodology for the automatic generation of synthetic question–answer pairs from any unannotated table. SynTabQA defines a detailed question typology, enabling fine-grained evaluation and facilitating the creation of diverse QA datasets. Our approach not only provides an automated test bed for any tabular dataset but can also be used in few-shot settings to supply LLMs with tailored examples, improving their focus and accuracy. We validate SynTabQA on two large, manually constructed tabular QA benchmarks of distinct nature.

Details

Paper ID
lrec2026-main-421
Pages
pp. 5389-5413
BibKey
grijalba-etal-2026-enhancing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JG

    Jorge Osés Grijalba

  • EC

    Eugenio Martí­nez Cámara

  • LU

    L. Alfonso Ureñ-López

  • JC

    Jose Camacho-Collados

Links