Small Can Be Beautiful in LLMs for SSH: a Case for Bulgarian
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Abstract
In the paper we present a set of small LLM-based models for solving the basic NLP tasks for Bulgarian - POS tagging, Lemmatization, Dependency parsing, Named Entity Recognition, Named Entity Linking, Event Annotation, among others. In order to create fine-tuned models for these tasks, we first pre-train models using architectures like BERT, Modern-BERT, and T5 with different sizes, over Bulgarian data only. For each of the tasks we report our approach towards the fine-tuning, the results from the experiments and also the evaluation. Then we define a way to visualize the results over HTML documents which contain the analyzed texts. Our rationale are as follows: most, if not all SSH research scenarios, need a reliable processing chains that can be customized with respect to the specific needs. These scenarios also would need proper visualization for human observation. We aim to provide such a basic LLM-based toolkit.