Gathering valency frames for annotation and batch corrections
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Abstract
Syntactic annotation is time- and resource-consuming, especially for historical and heterogeneous data. The Universal Dependencies (UD) framework provides a stable and cross-linguistically consistent annotation scheme, offering a crucial backbone for diachronic corpus studies. However, ensuring internal consistency within historical UD treebanks remains challenging due to syntactic variation and parser errors. We address this issue for Medieval and Classical French by integrating valency information into our corrections to support UD treebank maintenance. Valency frames were extracted from the Profiterole treebank (v. 2.7) and used to enrich OFrLex with structured valency information for Medieval French. Existing lexical resources such as Lefff are also exploited for Contemporary French. These valency frames are used to detect and correct inconsistencies in automatically annotated data through batch operations, thereby reinforcing UD guideline compliance and improving annotation coherence across diachronic stages. Preliminary experiments on Medieval French and exploratory annotation of Classical French data suggest that lexicon-informed error mining can reduce manual revision effort while strengthening the diachronic continuity enabled by the UD framework.