Back to Main Conference 2026
LREC 2026main

Russian Generative Spelling, Punctuation and Capitalization Correction

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2gv3b9npuo2s

Abstract

This paper presents SAGE, an open-access framework that encloses a set of models specifically designed for the generative correction of spelling, punctuation, and capitalization errors in Russian. The release includes four models, featuring a Russian-English version and a distilled version for easy use and cost-effectiveness. The models are pre-trained using a sequence-to-sequence approach on artificial errors that mimic human mistakes and fine-tuned on annotated multi-domain texts. A set of carefully engineered auxiliary learning objectives is employed during pre-training to enrich the models with additional semantic and syntactic information. Evaluations indicate that SAGE models, despite having a small number of parameters, outperform top-tier multilingual and Russian-specific large language models, including both closed- and open-source options, and are considered state-of-the-art. We release the online demo powered by a single Nvidia A100 80GB GPU as a Web service, which allows to simultaneously test the most advanced SAGE model of 1.7B parameters, its distilled version and the Russian-English SAGE model.

Details

Paper ID
lrec2026-main-773
Pages
pp. 9861-9872
BibKey
martynov-etal-2026-russian
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • NM

    Nikita Martynov

  • DA

    Danil Astafurov

  • UI

    Ulyana Isaeva

  • IM

    Ivan Vasil'yevich Maksimov

  • JA

    Joqsan Azocar

  • DK

    Dmitrii Kosenko

  • AF

    Alena Fenogenova

Links