Back to Main Conference 2002
LREC 2002main

Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/4vt2iedto5j7

Abstract

The aim of this paper is to investigate the major quantitative parameters related to the definition of the optimum text size in Modern Greek corpus development. Using the Hellenic National Corpus (HNC) (Hatzigeorgiu et al., 2000) as a reference point we estimated a number of critical statistical measures regarding feature counting in different text sizes. The results indicate that frequent linguistic features behave differently from the medium frequency and the rare ones and the text size increase do not affect them uniformly.

Details

Paper ID
lrec2002-main-099
Pages
N/A
BibKey
mikros-2002-quantitative
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • GM

    George Mikros

Links