Greenberg’s Universal 45 in Universal Dependencies: Gender Distinctions and Annotation Challenges
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Abstract
This paper revisits and extends Greenberg’s Universal 45 on gender distinctions using Universal Dependencies 2.17, comprising 339 treebanks across 186 languages. A systematic analysis of morphosyntactic patterns confirms the implicational hierarchy (singular > plural gender marking), with 98.6 % conformity in pronominal categories. Only two potential exceptions are detected, both with minimal occurrences and likely attributable to annotation errors. Extending the analysis beyond pronouns to 13 UPOS categories shows that core categories maintain near-perfect compliance, while peripheral categories exhibit higher violation rates, primarily driven by annotation inconsistencies rather than genuine linguistic exceptions. A total of 90 treebanks display gender-number features in traditionally invariable categories (e.g., adpositions, conjunctions, adverbs), indicating annotation issues such as prepositional contraction handling, homophone merging, and erroneous feature assignment. The study establishes a replicable computational methodology for large-scale typological validation, highlighting both the potential of corpus-based approaches and key limitations, including genealogical sampling biases, annotation heterogeneity despite universal schemas, and the false sense of comparability across treebanks.