HomeLREC 2026WorkshopsLT4HALAlrec2026-ws-lt4hala-01
Back to LT4HALA 2026
LREC 2026workshop

Morphological Annotation of Old Serbian in Universal Dependencies

Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026

DOI:10.63317/47f9i6wdkxtq

Abstract

We report on the morphological tagging of Old Serbian in the Universal Dependencies framework. To facilitate the manual annotation, we pre-processed the data with the Old Church Slavonic 2.12 UDPipe model. The decision was based on the known similarity of these two languages as well as on the declared performance of this model compared to other models for historical varieties of Slavic languages. With over 3,000 manually annotated tokens, we evaluated the performance of the relevant pre-trained UDPipe2 models of historical Slavic languages. Besides, we also trained and evaluated custom models with UDPipe1 containing the annotated Old Serbian data. We have found that: (1) for this particular domain and amount of training data, the most suitable model is UD Old East Slavic – Birchbark 2.12, although its declared performance is much lower than that of Old Church Slavonic; (2) even 3,000 tokens of Old Serbian increase the performance of UDPipe1 models almost to the level of the Birchbark 2.12 model. The dataset is publicly available at https://doi.org/10.5281/zenodo.19317842.

Details

Paper ID
lrec2026-ws-lt4hala-01
Pages
pp. 1-6
BibKey
polomac-etal-2026-morphological
Editors
Rachele Sprugnoli, Marco Passarotti
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • VP

    Vladimir Polomac

  • SC

    Silvie Cinkova

Links