HomeLREC 2026WorkshopsSLIDElrec2026-ws-slide-17
Back to SLIDE 2026
LREC 2026workshop

DiNoS: Creating a Data-Driven German Noun Phrase Lexicon from Universal Dependencies

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

DOI:10.63317/3ziovdhk9qac

Abstract

To foster investigations of noun phrase (NP) inflection in German at scale, this paper introduces DiNoS (Distributional Noun Structure), a data-driven lexicon of NP heads, which includes statistical information on the dependents and the morphosyntactic features of their original in-context appearances. We make available the source code for the extraction of NPs from CoNLL-U treebanks, which includes rule-based heuristics to improve feature annotation coverage and ensures a homogeneous lemmatisation strategy across treebanks. While the resulting JSON-based lexicon is suitable for no-code interaction for non-experts, it is further supported by a toolkit for the automatic calculation of, and access to, various statistical overviews. In this paper, we present the heuristics employed to extract NP datasets from the German Universal Dependencies’ Hamburg Dependency and GSD treebanks. In addition, we provide a preview of the emerging DiNoS lexica’s properties and discuss some implications of noun and determiner word form ambiguity for NP complexity.

Details

Paper ID
lrec2026-ws-slide-17
Pages
pp. 191-201
BibKey
suchardt-etal-2026-dinos
Editors
Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • JS

    Jacob Lee Suchardt

  • RL

    Ronja Laarmann-Quante

Links