Extending Retag to Conversion Error Detection: A Case Study on SynTagRus Morphology
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Abstract
Linguistically annotated corpora are often converted between annotation schemes, but errors introduced during conversion can compromise their reliability. While annotation error detection is a well-studied topic, conversion error detection remains largely unexplored. We adapt the Retag method, which is traditionally used for finding annotation errors, to identify conversion errors by comparing model performance on original and converted versions of the same corpus, aligned at the token level. Applying this approach to the SynTagRus corpus converted to Universal Dependencies, we achieve high-precision detection of conversion errors in morphological annotation. Our analysis reveals systematic errors in distinction of auxiliary verbs, pronouns, numerals, and multi-word named entities, and uncovers previously undocumented annotation inconsistencies between different sections of the corpus. The method can be applied to any converted dataset for which an aligned source is available, providing an efficient way to target conversion errors for manual correction without exhaustive inspection.