LLM as a Morphological Disambiguator for Belarusian: A Preliminary Study
Proceedings of the SIGUL 2026 Joint Workshop with ELE, EURALI, and DCLRL "Towards Inclusivity and Equality: Language Resources and Technologies for Under-Resourced and Endangered Languages
Abstract
We explore the use of large language models (LLMs) for morphological disambiguation in Belarusian, a low-resource language. The pipeline has two stages: a rule-based analyzer generates candidate lemmas and grammatical tags, which an LLM then disambiguates in context. Initial evaluation of ChatGPT, Claude, and Gemini on a gold-standard sample shows high accuracy. We scale this approach to a 375K-word corpus using Gemini and compare the results against a neural baseline (Stanza). Manual review of discrepancies suggests that the LLM-based approach outperforms the baseline, offering a solution for corpus annotation in Belarusian.