HomeLREC 2026WorkshopsUDWlrec2026-ws-udw-15
Back to UDW 2026
LREC 2026workshop

Greenberg’s Universal 45 in Universal Dependencies: Gender Distinctions and Annotation Challenges

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

DOI:10.63317/56hd7e52srga

Abstract

This paper revisits and extends Greenberg’s Universal 45 on gender distinctions using Universal Dependencies 2.17, comprising 339 treebanks across 186 languages. A systematic analysis of morphosyntactic patterns confirms the implicational hierarchy (singular > plural gender marking), with 98.6 % conformity in pronominal categories. Only two potential exceptions are detected, both with minimal occurrences and likely attributable to annotation errors. Extending the analysis beyond pronouns to 13 UPOS categories shows that core categories maintain near-perfect compliance, while peripheral categories exhibit higher violation rates, primarily driven by annotation inconsistencies rather than genuine linguistic exceptions. A total of 90 treebanks display gender-number features in traditionally invariable categories (e.g., adpositions, conjunctions, adverbs), indicating annotation issues such as prepositional contraction handling, homophone merging, and erroneous feature assignment. The study establishes a replicable computational methodology for large-scale typological validation, highlighting both the potential of corpus-based approaches and key limitations, including genealogical sampling biases, annotation heterogeneity despite universal schemas, and the false sense of comparability across treebanks.

Details

Paper ID
lrec2026-ws-udw-15
Pages
pp. 174-182
BibKey
brosarodriguez-etal-2026-greenberg
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AB

    Antoni Brosa-Rodriguez

  • MJ

    M. Dolores Jimenez Lopez

Links