HomeLREC 2026WorkshopsUDWlrec2026-ws-udw-28
Back to UDW 2026
LREC 2026workshop

Extending Retag to Conversion Error Detection: A Case Study on SynTagRus Morphology

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

DOI:10.63317/4uhg28waoj59

Abstract

Linguistically annotated corpora are often converted between annotation schemes, but errors introduced during conversion can compromise their reliability. While annotation error detection is a well-studied topic, conversion error detection remains largely unexplored. We adapt the Retag method, which is traditionally used for finding annotation errors, to identify conversion errors by comparing model performance on original and converted versions of the same corpus, aligned at the token level. Applying this approach to the SynTagRus corpus converted to Universal Dependencies, we achieve high-precision detection of conversion errors in morphological annotation. Our analysis reveals systematic errors in distinction of auxiliary verbs, pronouns, numerals, and multi-word named entities, and uncovers previously undocumented annotation inconsistencies between different sections of the corpus. The method can be applied to any converted dataset for which an aligned source is available, providing an efficient way to target conversion errors for manual correction without exhaustive inspection.

Details

Paper ID
lrec2026-ws-udw-28
Pages
pp. 305-314
BibKey
movsesian-etal-2026-extending
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AM

    Andrei Movsesian

  • DT

    Daniil Timchenko

Links