HomeLREC 2026WorkshopsSLIDElrec2026-ws-slide-10
Back to SLIDE 2026
LREC 2026workshop

Improving Slovene Language Models for Lexicographic Question Answering through Continued Pretraining and Instruction Fine-Tuning

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

DOI:10.63317/2t3zgpnbv52e

Abstract

This paper presents a two-stage training approach to improve the performance of Slovene large language models on lexicographic question-answering tasks. We developed a comprehensive lexical pretraining corpus containing 356,294 Slovene word entries. We constructed the corpus by converting structured data from multiple lexicographic sources into markdown format. Additionally, we created a question-answering dataset with 10,485 QA pairs from diverse sources, including automatically generated questions, a linguistic advisory portal, and community forums. Using the Slovenian GaMS model (based on Gemma 2 9B) and GaMS 3 model (based on Gemma 3 12B), we performed continued pretraining on the lexical corpus, followed by instruction fine-tuning with our QA dataset combined with translated general-domain questions. We compared results to different model configurations. Our results demonstrate significant improvements (text similarity increasing from 0.226 to 0.542, BERTScore F1 of 0.915) in answering Slovene lexicographic questions, validating the effectiveness of domain-specific continued pretraining for low-resource languages.

Details

Paper ID
lrec2026-ws-slide-10
Pages
pp. 114-123
BibKey
knez-etal-2026-improving
Editors
Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • TK

    Timotej Knez

  • SZ

    Slavko Zitnik

Links