LLM as a Morphological Disambiguator for Belarusian: A Preliminary Study

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

DOI:10.63317/3skazxbd27m8

Abstract

We explore the use of large language models (LLMs) for morphological disambiguation in Belarusian, a low-resource language. The pipeline has two stages: a rule-based analyzer generates candidate lemmas and grammatical tags, which an LLM then disambiguates in context. Initial evaluation of ChatGPT, Claude, and Gemini on a gold-standard sample shows high accuracy. We scale this approach to a 375K-word corpus using Gemini and compare the results against a neural baseline (Stanza). Manual review of discrepancies suggests that the LLM-based approach outperforms the baseline, offering a solution for corpus annotation in Belarusian.

Resources

Details

Paper ID

lrec2026-ws-sigul-04

Pages

pp. 42-48

DOI

10.63317/3skazxbd27m8

BibKey

poritski-etal-2026-llm

Editors

Atul Kr. Ojha, Sakriani Sakti, Claudia Soria, Maite Melero, John P. McCrae, Constantine Lignos, Chao-Hong Liu, German Rigau Claramunt, Georg Rehm

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

VP
Vladislav Poritski
OV
Oksana Volchek
IA
Ilia Afanasev

Links

URL

DOI