Back to Main Conference 2024
LREC-COLING 2024main

Towards a Danish Semantic Reasoning Benchmark - Compiled from Lexical-Semantic Resources for Assessing Selected Language Understanding Capabilities of Large Language Models

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4ick78zs79e7

Abstract

We present the first version of a semantic reasoning benchmark for Danish compiled semi-automatically from a number of human-curated lexical-semantic resources, which function as our gold standard. Taken together, the datasets constitute a benchmark for assessing selected language understanding capacities of large language models (LLMs) for Danish. This first version comprises 25 datasets across 6 different tasks and include 3,800 test instances. Although still somewhat limited in size, we go beyond comparative evaluation datasets for Danish by including both negative and contrastive examples as well as low-frequent vocabulary; aspects which tend to challenge current LLMs when based substantially on language transfer. The datasets focus on features such as semantic inference and entailment, similarity, relatedness, and ability to disambiguate words in context. We use ChatGPT to assess to which degree our datasets challenge the ceiling performance of state-of-the-art LLMs, average performance being relatively high with an average accuracy of 0.6 on ChatGPT 3.5 turbo and 0.8 on ChatGPT 4.0.

Details

Paper ID
lrec2024-main-1421
Pages
pp. 16353-16363
BibKey
pedersen-etal-2024-towards
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • BP

    Bolette Pedersen

  • NS

    Nathalie Sørensen

  • SO

    Sussi Olsen

  • SN

    Sanni Nimb

  • SG

    Simon Gray

Links