HomeLREC 2026WorkshopsUDWlrec2026-ws-udw-24
Back to UDW 2026
LREC 2026workshop

From Treebank Metadata to Sentence-Level Genre in Universal Dependencies: A Reproducible, Versioned Resource

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

DOI:10.63317/22ivc8whfgue

Abstract

We release a sentence-level genre layer for Universal Dependencies as a separate, joinable dataset, computed across UD revisions and linked back to the underlying treebanks via a release-aware composite key comprising treebank, split, sent_id, and UD release metadata. The annotations are derived rather than authoritative and are accompanied by provenance and uncertainty indicators, enabling downstream users to choose appropriate precision-coverage trade-offs and to re-run the pipeline as UD evolves. To support both parity tracking and deployment-oriented interpretation, we report results under two complementary regimes: a fixed-partition setting aligned with earlier protocols, and a language-grouped 10-fold generalisation setting that highlights cross-language heterogeneity and anchor sparsity as operational constraints. The resulting resource is intended to make genre a practical control variable for UD-based experimentation, including genre-stratified evaluation and training data selection for POS tagging and parsing, where performance varies substantially across text types. Finally, we note that reduced genre spaces aligned with recurring robustness profiles (e.g. transcribed speech versus interactional web/social text versus edited prose/news) appear pragmatically useful, but should be treated as a community coordination task implemented through explicit, versioned mapping tables.

Details

Paper ID
lrec2026-ws-udw-24
Pages
pp. 268-276
BibKey
stemle-2026-treebank
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • ES

    Egon Stemle

Links