Back to Main Conference 2010
LREC 2010main

The Design of Syntactic Annotation Levels in the National Corpus of Polish

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/367p2ywou7gb

Abstract

The paper presents the procedure of syntactic annotation of the National Corpus of Polish. The paper concentrates on the delimitation of syntactic words (analytical forms, reflexive verbs, discontinuous conjunctions, etc.) and syntactic groups, as well as on problems encountered during the annotation process: syntactic group boundaries, multiword entities, abbreviations, discontinuous phrases and syntactic words. It includes the complete tagset for syntactic words and the list of syntactic groups recognized in NKJP. The tagset defines grammatical classes and categories according to morphosyntactic and syntactic criteria only. Syntactic annotation in the National Corpus of Polish is limited to making constituents of combinations of words. Annotation depends on shallow parsing and manual post-editing of the results by annotators. Manual annotation is performed by two independents annotators, with a referee in cases of disagreement. The manually constructed grammar, both for syntactic words and for syntactic groups, is encoded in the shallow parsing system Spejd.

Details

Paper ID
lrec2010-main-175
Pages
N/A
BibKey
glowinska-przepiorkowski-2010-design
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • KG

    Katarzyna Głowińska

  • AP

    Adam Przepiórkowski

Links