Back to Main Conference 2026
LREC 2026main

C4: A Multilingual Benchmark for Retrieval-Augmented Generation Based on the Catechism of the Catholic Church and Its Compendium

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5a8nuzcc3cq3

Abstract

We introduce a new multilingual case study for evaluating retrieval augmented generation (RAG) systems, based on the Catechism of the Catholic Church and its Compendium. The Catechism is a structured document with numbered paragraphs, officially translated into many languages under strict editorial alignment. The Compendium reformulates this material into a question-answer format with explicit citations to the corresponding paragraphs. Together, they form a set of parallel monolingual corpora that share identical semantic structure, enabling direct, controlled comparison of RAG performance across languages. Beyond its theological origin, this text pair closely mirrors real-world applications of RAG in institutional contexts, such as querying internal policy documents with associated FAQ-style summaries, making it a practical testbed for multilingual retrieval and grounded answer generation. We release our data collection scripts and baseline results for further research.

Details

Paper ID
lrec2026-main-590
Pages
pp. 7446-7456
BibKey
dniken-etal-2026-c4
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PD

    Pius von Däniken

  • MC

    Mark Cieliebak

  • JD

    Jan Deriu

Links