Back to Main Conference 2022
LREC 2022main

CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/34bje4jp4zxa

Abstract

We introduce the CRASS (counterfactual reasoning assessment) data set and benchmark utilizing questionized counterfactual conditionals as a novel and powerful tool to evaluate large language models. We present the data set design and benchmark. We test six state-of-the-art models against our benchmark. Our results show that it poses a valid challenge for these models and opens up considerable room for their improvement.

Details

Paper ID
lrec2022-main-229
Pages
pp. 2126-2140
BibKey
frohberg-binder-2022-crass
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • JF

    Jörg Frohberg

  • FB

    Frank Binder

Links