Bulgarian Massive Multitask Language Understanding Benchmark
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Assessing the broad general knowledge of Large Language Models (LLMs) across multiple domains in Bulgarian remains challenging due to the limited availability of Bulgarian evaluation benchmarks. To address this gap, we introduce the Bulgarian Massive Multitask Language Understanding benchmark (MMLU-BG), designed to evaluate whether LLMs possess generalised knowledge capabilities beyond simple text prediction in Bulgarian. This paper presents the structure, the development protocol, and the size of the MMLU-BG benchmark. It is tested in comparison with the original MMLU for English across seven LLMs selected according to specific criteria. The experiments demonstrate that the MMLU-BG benchmark assesses multi-domain versatility and highlights the models’ strengths and weaknesses across different subject areas.