Back to Main Conference 2026
LREC 2026main

Evaluation of Two Leading Polish Language Models in a Real-world RAG Scenario

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/36igcwtic7tn

Abstract

This paper presents a comparative evaluation of two leading Polish instruction-tuned language models, Bielik-11B-v2.3-Instruct and PLLuM-12B-nc-chat, within a real-world Retrieval-Augmented Generation (RAG) system designed for the technical documentation of a low-code platform. The study aims to identify the optimal configuration of retrieval and generation components for Polish-language applications. The evaluation was conducted in two stages. First, several embedding models and retrieval methods were tested using standard information retrieval metrics, including NDCG. The OrlikB/KartonBERT-USE-base-v1 model combined with vector-based retrieval achieved the highest performance and was adopted for the second stage. In the generation phase, both models were evaluated using quantitative scoring and pairwise A/B testing with multiple evaluators to ensure robustness. Results show that Bielik-11B-v2.3-Instruct consistently outperformed PLLuM-12B-nc-chat in producing accurate and contextually relevant answers. The study highlights the importance of constructing a reliable golden set, employing a two-phase evaluation pipeline, and selecting appropriate metrics to ensure objective and reproducible assessment of RAG systems in real-world Polish-language contexts.

Details

Paper ID
lrec2026-main-211
Pages
pp. 2698-2704
BibKey
bartanowicz-etal-2026-evaluation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SB

    Szymon Bartanowicz

  • KJ

    Krzysztof Jassem

Links