Back to Main Conference 2026
LREC 2026main

Cross-Dataset Inconsistencies in Morphological Annotation: Evidence from Universal Dependencies

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/55hiti2bjus3

Abstract

Ensuring annotation consistency is a challenging task in language dataset development. While difficulty is typically increasing at higher levels of linguistic complexity, we show that it is a critical issue even for fundamental linguistic tasks such as morphological annotation. Contrary to previous research that targeted intra-dataset inconsistencies, this study investigates inconsistencies across various pre-existing datasets for the same language. On the example of Universal Dependencies datasets, we examined what morphological categories exhibit the most disagreement. The analysis revealed that there are specific categories with low inconsistency score that indicates good agreement on these features (namely Case, Gender, Number and to a lesser extent Animacy). On the other hand, the Part-of-Speech (UPOS) tag stands out as a "red flag" due to high inconsistency score. Analysis of the most frequent inconsistencies suggest that they are dataset-specific artifacts rather than inherently language-specific phenomena.

Details

Paper ID
lrec2026-main-917
Pages
pp. 11715-11723
BibKey
ohldalov-2026-cross
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • VO

    Vlasta Ohlídalová

Links