Back to Main Conference 2026
LREC 2026main

RuBIN: A Russian Benchmark for Evaluating LLMs with Cultural Insights

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3um9hpbgpxph

Abstract

Understanding culture-specific knowledge is essential for developing language models that perform reliably across diverse social and linguistic settings. This work explores both methodological and practical aspects of evaluating culture-specific knowledge in large language models. Special attention is given to the multiple-choice question answering format as a tool for identifying and measuring such knowledge. An analysis of existing benchmarks reveals several limitations, including insufficient cultural sensitivity and the presence of uninformative distractor options. In response, the RuBIN benchmark is introduced – a dataset consisting of questions based on phrases that are widely known in Russian culture. The paper describes the process of selecting and filtering culturally relevant topics, generating plausible incorrect answers using LLMs, and annotating and testing the benchmark for cross-linguistic robustness. RuBIN helps identify current LLMs’ weaknesses in transferring cultural knowledge and can serve as a tool for further adapting these models to diverse linguistic and cultural contexts.

Details

Paper ID
lrec2026-main-326
Pages
pp. 4126-4140
BibKey
lazukova-etal-2026-rubin
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PL

    Polina Lazukova

  • IP

    Irina Piontkovskaya

Links