HomeLREC 2026WorkshopsLLMS4SSHlrec2026-ws-llms4ssh-20
Back to LLMS4SSH 2026
LREC 2026workshop

Small Can Be Beautiful in LLMs for SSH: a Case for Bulgarian

Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026

DOI:10.63317/5n9bq7g3it2s

Abstract

In the paper we present a set of small LLM-based models for solving the basic NLP tasks for Bulgarian - POS tagging, Lemmatization, Dependency parsing, Named Entity Recognition, Named Entity Linking, Event Annotation, among others. In order to create fine-tuned models for these tasks, we first pre-train models using architectures like BERT, Modern-BERT, and T5 with different sizes, over Bulgarian data only. For each of the tasks we report our approach towards the fine-tuning, the results from the experiments and also the evaluation. Then we define a way to visualize the results over HTML documents which contain the analyzed texts. Our rationale are as follows: most, if not all SSH research scenarios, need a reliable processing chains that can be customized with respect to the specific needs. These scenarios also would need proper visualization for human observation. We aim to provide such a basic LLM-based toolkit.

Details

Paper ID
lrec2026-ws-llms4ssh-20
Pages
pp. 187-197
BibKey
simov-etal-2026-small
Editors
Arturo Montejo-Raez, Cristina Grisot, Joanna Blochowiak, Nikola Ljubešić, Elena Battaner, German Rigau
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • KS

    Kiril Simov

  • NP

    Nikolay Paev

  • PO

    Petya Osenova

  • TV

    Teodor Valchev

  • SM

    Stefan Marinov

Links