Back to Main Conference 2002
LREC 2002main

Efficient Stochastic Part-of-Speech Tagging for Hungarian

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/2rnbnd8t8z9s

Abstract

Many of the methods developed for Western European languages and used widespread to produce annotated language resources cannot readily be applied to Central and Eastern European languages, due to the large number of novel phenomena exhibited in the syntax and morphology of these languages, which these methods have to handle but have not been designed to cope with. The process of morphological tagging when applied to Hungarian data to produce corpora annotated at least at the morphosyntactic level is most indicative of this problem: several of the algorithms (either rule-based or statistical) that have been used very successfully in other domains cannot readily be applied to a language exhibiting such a varied morphology and huge number of wordforms as Hungarian. The paper will describe a robust tagging scenario for Hungarian using a relatively simple stochastic system augmented with external morphological processing, which can overcome the two most conspcicuous problems: the complexity of morphosyntactic descriptions and most importantly the huge number of possible wordforms.

Details

Paper ID
lrec2002-main-201
Pages
N/A
BibKey
oravecz-dienes-2002-efficient
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • CO

    Csaba Oravecz

  • PD

    Péter Dienes

Links