Back to Main Conference 2026
LREC 2026main

The Corpus of Contemporary Polish — a New Reference Corpus with Rich Syntactic Annotations

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2e37nxvjrs42

Abstract

In the paper, we describe the Corpus of Contemporary Polish (KWJP) and its rich syntactic annotation. The corpus covers a wide range of text originally published between 2011 and 2020. Although it carries on the idea of providing up-to-date reference corpora of Polish initiated by the National Corpus of Polish (NKJP) project, the principles underlying its development are not the same. In this article, we outline the different choices that affect corpora content and give an explanation for them. The article focuses mainly on the description of annotation layers in KWJP which are generated with a neural network based tool specially developed for this purpose. We describe in details syntactic structure annotation, which is represented by hybrid trees combining information typical to constituency and dependency trees. Finally, we provide several examples showing how annotation with hybrid trees facilitates querying and effective searching for information in the corpus.

Details

Paper ID
lrec2026-main-907
Pages
pp. 11585-11592
BibKey
kiera-etal-2026-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • WK

    Witold Kieraś

  • MM

    Małgorzata Marciniak

  • MW

    Marcin Woliński

  • KK

    Katarzyna Krasnowska-Kieraś

  • Marek Łaziński

Links