Back to Main Conference 2024
LREC-COLING 2024main

Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4wrzvsf2shaz

Abstract

Large Language Models (LLMs) are showing emerging abilities, and one of the latest recognized ones deals with their ability to reason and answer questions from tabular data. Although there are some available datasets to assess question answering systems on tabular data, they are not large and diverse enough to properly assess the capabilities of LLMs. To this end, we propose DataBench, a benchmark composed of 65 real-world datasets over several domains, including 20 human-generated questions per dataset, totaling 1300 questions and answers overall. Using this benchmark, we perform a large-scale empirical comparison of several open and closed source models, including both code-generating and in-context learning models. The results highlight the current gap between open-source and closed-source models, with all types of model having room for improvement even in simple boolean questions or involving a single column.

Details

Paper ID
lrec2024-main-1179
Pages
pp. 13471-13488
BibKey
oses-grijalba-etal-2024-question
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • JO

    Jorge Osés Grijalba

  • LU

    L. Alfonso Ureña-López

  • EM

    Eugenio Martínez Cámara

  • JC

    Jose Camacho-Collados

Links