Back to Main Conference 2026
LREC 2026main

Ragability Benchmark: A Dataset and Library to Test LLMs on Inter-context Conflicts

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2ty3hnn3bgb9

Abstract

Knowledge conflicts are a challenging issue when applying retrieval augmented generation (RAG) systems. In this paper, we propose a benchmark to test LLMs on how they deal with inter-context knowledge conflicts where implicit reasoning is required to solve the conflict. Based on actual empirical examples, real entities are replaced by fantasy entities to make sure the model’s internal knowledge does not influence how the model deals with external conflicting information. The proposed benchmark can be used to assess current up-to-date LLMs, but it can also flexibly be adapted for in-depth evaluation of a specific RAG system on selected aspects of conflict identification. We also present an experiment where we apply the benchmark to test 7 current LLMs from different model families. The results show that LLMs are able to identify conflicting contexts (’Is there a contradiction, yes or no?’), while they struggle with answering content related queries. Adding a hint that there might be a contradiction in the provided contexts increases the performance of conflict identification for contradictory context, while it significantly decreases the performance for non-contradictory contexts.

Details

Paper ID
lrec2026-main-182
Pages
pp. 2323-2333
BibKey
gross-etal-2026-ragability
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SG

    Stephanie Gross

  • JP

    Johann Petrak

  • BK

    Brigitte Krenn

Links