HomeLREC 2026WorkshopsSIGULlrec2026-ws-sigul-04
Back to SIGUL 2026
LREC 2026workshop

LLM as a Morphological Disambiguator for Belarusian: A Preliminary Study

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/3skazxbd27m8

Abstract

We explore the use of large language models (LLMs) for morphological disambiguation in Belarusian, a low-resource language. The pipeline has two stages: a rule-based analyzer generates candidate lemmas and grammatical tags, which an LLM then disambiguates in context. Initial evaluation of ChatGPT, Claude, and Gemini on a gold-standard sample shows high accuracy. We scale this approach to a 375K-word corpus using Gemini and compare the results against a neural baseline (Stanza). Manual review of discrepancies suggests that the LLM-based approach outperforms the baseline, offering a solution for corpus annotation in Belarusian.

Details

Paper ID
lrec2026-ws-sigul-04
Pages
pp. 42-48
BibKey
poritski-etal-2026-llm
Editors
Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • VP

    Vladislav Poritski

  • OV

    Oksana Volchek

  • IA

    Ilia Afanasev

Links