Back to Main Conference 2002
LREC 2002main

Implementation and Evaluation of PAROLE PoS in a National Context

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/5ftzd44ee38h

Abstract

We are annotating the complete 20 million Dutch PAROLE corpus with PoS and lemma. The morphosyntactic tagging of 250,000 words during the PAROLE project was the first confrontation of the fine-grained Dutch PAROLE tagset and its 'functional' mode of application, with real corpus data. The correction of the manual tagging and the compilation of a 100,000 words training corpus for the automatic tagger initiated the evaluation of the suitability of the tagset and the methodology of tag assignment, which topics will both be discussed in this paper. The reality of corpus data brought about a number of adaptations, linguistic restrictions and generalisations. The most salient tagger results will be presented. Our experience is relevant for a new project: the Integrated Language Database of 8th - 21st Century Dutch (ILD), which will contain a text corpus covering all these centuries. The corpus will be annotated with lemma and PoS, in which process historical lexica will be used. Obviously, we will have to tailor tagset and methodology of tag assignment optimally to these purposes.

Details

Paper ID
lrec2002-main-081
Pages
N/A
BibKey
dutilh-kruyt-2002-implementation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • TD

    Tilly Dutilh

  • TK

    Truus Kruyt

Links