Back to Main Conference 2026
LREC 2026main

Prague Dependency Treebank - Consolidated 2.0: Enriching a Complex Annotation Scheme

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/276qjpo35shu

Abstract

The Prague Dependency Treebank framework is unique in its attempt to systematically include and link different layers of language, including a meaning representation with several types of inter-sentential phenomena, especially coreference and discourse relation. We present its second consolidated version (PDT-C 2.0), which concludes almost 30-years long project of sustained development of the resource to a uniformly and coherently annotated, genre-diversified, almost 4 million token language resource of Czech language, with accompanying fully compatible lexicons. In addition to continuous linguistic research, the richly linguistically annotated corpus is also widely used in international comparisons of the development of traditional and novel NLP tools as well as in conversions into other formalisms. The corpus and the trained parsers are available under the CC BY-NC-SA licence.

Details

Paper ID
lrec2026-main-908
Pages
pp. 11593-11605
BibKey
mikulov-etal-2026-prague
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MM

    Marie Mikulová

  • JM

    Jiří Mírovský

  • MS

    Milan Straka

  • PS

    Pavlína Synková

  • Jan Štěpánek

  • Barbora Štěpánková

  • JH

    Jan Hajič

Links