Structural Divergence under Shared Language-Level Specification: Griko in Universal Dependencies
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Abstract
Dialectal varieties pose major challenges for NLP resource development, especially when annotation frameworks are organized around standardized language specifications. In Universal Dependencies (UD), dialects without independent ISO codes are subsumed under the corresponding standard language and inherit its language-level documentation, validator settings, and grammatical inventories. This paper examines Griko, a Greek variety spoken in southern Italy that developed in relative isolation from the Modern Greek dialect continuum while remaining in long-term contact with local Italo-Romance varieties. We assess the consequences of this organizational structure through controlled parsing experiments comparing intra-dialectal training, cross-dialectal transfer from Standard Modern Greek (SMG), script-controlled transfer using romanized SMG, and contact-related cross-lingual transfer from Italian. Our results show that, before romanization, the Italian model even surpasses SMG on several UD metrics and that, although romanization substantially improves SMG-based transfer, performance still remains far below the intra-dialectal baseline. We argue that this persistent gap reflects the interaction between structural divergence and language-level validation constraints, a phenomenon we term ISO-based validation coupling. Through analyses of auxiliary systems, voice marking, and progressive constructions, we show how standard-centric validation architectures can constrain the representation of dialect-specific grammar. More broadly, the Griko case highlights the limitations of language-centric organization in UD and underscores the need for variety-sensitive mechanisms when extending universal annotation frameworks to structurally divergent dialects.